Architectures for Neuromorphic ComputingGoldenGate TrueNorth 4 0 100 200 300 400 500 600 0 50 100 150 200 250 Time (ms) Neuron Index Simulation Chip Fig. 5. The spiking dynamics of

AVLSI

Architectures for Neuromorphic Computing

Rajit ManoharComputer Systems LabYale Universityhttp://csl.yale.edu/http://avlsi.csl.yale.edu/

AVLSI

The context: microelectronics scaling

• It’s been a great ride…❖ … but sequential programs don’t

speed up each year like they used to in the “good old days.”

• Computation demand is growing!❖ Massive amounts of data being

collected by cheap, ubiquitous sensors.

❖ ~ 1.5B smartphones (with cameras) shipped in 2017.*

❖ ~ 0.75B monthly active users on Instagram in 2017.*

❖ Modern machine learning depends on massive amounts of data.

Data collected by: M. Horowitz, F. Labonte, O. Shacham, K. Olokutun and others; extrapolations by C. Moore

*Kleiner-Perkins 2018 Internet Trends

AVLSI

Parallelism to the rescue?

1967

• Some algorithms just aren’t parallel ❖ “Unfortunately, for most interesting algorithms, […] no architecture is scalable

[…]” -- Agarwal et al. (CACM 1991)

• But maybe we’re going about this the wrong way… ❖ Physical systems, by their very nature, are massively parallel. ❖ Can we build computing systems inspired by physical ones?

AVLSI

Neuromorphic computing*

• Philosophical motivation❖ Understand thought, consciousness

• Biological motivation❖ Understand the brain through engineering

• Computational motivation❖ Real-time vision, speech, pattern recognition, …

*term coined by Carver Mead

“Neuro” = neural “-morphic” = “having the shape, form, or structure”

AVLSI

Neuromorphics 101

• Basic computation❖ Weighted input spikes are accumulated on a

capacitor❖ The neuron is implemented as a “threshold

detector”❖ On an output spike, the state of the neuron is

reset (with a refractory period)

• ~1,000 to 10,000 synapses per neuron

• Classical approach ❖ Mixed-signal design: analog neurons and

synapse circuits, digital asynchronous communication

axon

weight

+ -threshold

reset

dendrite

synapse

neuron

leak

AVLSI

General purpose neuromorphic systems

• Core components❖ Set of neurons + synapses from the

network being modeled mapped to hardware

❖ Synapses can be made “superposable”❖ Routing network handles spike

communication between hardware elements

• Time-multiplexing❖ Common hardware for computation❖ Per-neuron/per-synapse state

synapse hardware

neuron hardware

synapse hardware

neuron hardware

synapse hardware

neuron hardware

routing

AVLSI

SyNAPSE project: 2008 to 2015❖ Analog neurons, digital spikes and

routing ❖ Special materials for synapses (oxides,

phase change memory, magnetic tunneling junctions, nanotraps)

❖ Embrace uncertainty in manufacturing technology, no compensation for variability

❖ Errors in communication/neuron okay because the system will compensate

❖ Analog neurons, digital spikes and routing

❖ Special materials for synapses (oxides, phase change memory, magnetic tunneling junctions, nanotraps)

❖ Embrace uncertainty in manufacturing technology, no compensation for variability

❖ Errors in communication/neuron okay because the system will compensate

2008 2014

GoldenGate TrueNorth

4

0 100 200 300 400 500 6000

50

100

150

200

250

Time (ms)

Neu

ron

In

dex

Simulation

Chip

Fig. 5. The spiking dynamics of the chip exactly match a software simulationwhen configured with the same parameters and recurrent connectivity. Spikesare plotted as dots for measured chip data and circles for simulation.

distribution of pixel correlations in the digits (60,000 images).After learning these weights, we trained 10 linear classifierson the outputs of the hidden units using supervised learning.Finally, we test how well the network classifies digits on out-of-sample data (10,000 images), and achieved 94% accuracy.

To map the RBM onto our neurosynaptic core, we makethe following choices: First, we represent the 256 hidden unitswith our integrate-and-fire neurons. Next, we represent eachvisible unit using two axons, one for positive (excitatory) con-nections and the other for negative (inhibitory) connections,accounting for 968 of 1024 axons. Then, we cast the 484×256real-valued weight matrix into two 484×256 binary matrices,one representing the positive connections (taking the highest15% of the positive weights), and the other representing thenegative connections (taking the lowest 15% of the weights).Finally, the synaptic values and thresholds of each neuron areadjusted to normalize the sum total input in the real-valuedcase with the sum total input of the binary case.

Following the example from [6], we are able to imple-ment the RBM using spiking neurons by imposing a globalinhibitory rhythm that clocks network dynamics. In the firstphase of the rhythm (no inhibition), hidden units accumulatesynaptic inputs driven by the pixels, and spike when theydetect a relevant feature; these spikes correspond to binaryactivity of a conventional (non-spiking) RBM in a singleupdate. In the second phase of the rhythm, the strong inhibitionresets all membrane potentials to 0. By sending the outputsof the hidden units to the same linear classifier as before(implemented off-chip) we achieve 89% accuracy for out-of-sample data (see Fig. 6 for one trial).

Our simple mapping from real-valued to binary weightsshows that the performance of the RBM does not decreasesignificantly, and suggests that more sophisticated algorithms,such as deep Boltzmann machines, will also perform well inhardware despite binary weights.

V. DISCUSSION

A long standing goal in the neuromorphic community is tocreate a compact, modular block that combines neurons, largesynaptic fanout, and addressable inputs. Our breakthroughneurosynaptic core, with digital neurons, crossbar synapses,and address-events for communication, is the first of its kind

22 x 22 Visible Units

968 Axons x 256 Neurons

10 Label Neurons

Axons

Synapses

Input Predicted: 3

+

-

Fig. 6. (left) Pixels represent visible units, which drive spike activity onexcitatory (+) and inhibitory (-) axons. (middle) 16 × 16 grid of neuronsspike in response to the digit stimulus. Spikes are indicated as black squares,and encode the digit as a set of features. (right) An off-chip linear classifiertrained on the features, and the resulting activation. Here, the classifier predictsthat 3 is the most likely digit, whereas 6 is the least likely.

to achieve this long standing goal in working silicon. Thekey new component of our design is the embedded crossbararray, which allows us to implement synaptic fanout withoutresorting to off-chip memory that can create an I–O bottleneck.By bypassing this critical bottleneck, it is now possible to builda large on-chip network of neurosynaptic cores, creating anultra-low power neural fabric that can support a wide array ofreal-time applications that are one-to-one with software.

Looking forward, to build a human-scale system with 1014

synapses (distributed across many chips), our next focus isto tackle the formidable but tractable challenges of density,passive power, and active power for inter-core communication.

ACKNOWLEDGMENT

This research was sponsored by DARPA under contract No.HR0011-09-C-0002. The views and conclusions contained herein arethose of the authors and should not be interpreted as representing theofficial policies, either expressly or implied, of DARPA or the U.S.Government. Authors thank William Risk and Sue Gilson for projectmanagement, Scott Fairbanks, Scott Hall, Kavita Prasad, Ben Hill,Ben Johnson, Rob Karmazin, Carlos Otero, and Jon Tse for physicaldesign, Steven Esser and Greg Corrado for their work on digitalneurons, and Andrew Cassidy and Rodrigo Alvarez for developingsoftware for data collection.

REFERENCES

[1] R. Ananthanarayanan, S. K. Esser, H. D. Simon, and D. S. Modha,“The cat is out of the bag: cortical simulations with 109 neurons,1013 synapses,” in Proceedings of the Conference on High PerformanceComputing Networking, Storage and Analysis. New York, NY, USA:ACM, 2009, pp. 63:1–63:12.

[2] C. Mead, Analog VLSI and neural systems. Boston, MA, USA: Addison-Wesley Longman Publishing Co., Inc., 1989.

[3] R. Manohar, “A case for asynchronous computer architecture,” in Pro-ceedings of the ISCA Workshop on Complexity-Effective Design, 2000.

[4] N. Imam and R. Manohar, “Address-event communication using token-ring mutual exclusion,” in Proceedings of the IEEE International Sympo-sium on Asynchronous Circuits and Systems, 2011.

[5] G. E. Hinton and R. R. Salakhutdinov, “Reducing the dimensionality ofdata with neural networks,” Science, vol. 313, no. 5786, pp. 504–507,July 2006. [Online]. Available: http://dx.doi.org/10.1126/science.1127647

[6] P. Merolla, T. Ursell, and J. Arthur, “The thermodynamic temperature ofa rhythmic spiking network,” CoRR, vol. abs/1009.5473, 2010.

long-term collaboration with

IBM Research

AVLSI

Current state-of-the-art

• IBM/Cornell “TrueNorth” chip❖ ~25 pJ/synaptic

operation❖ 65mW for 1M neurons,

256M synapses

• 28nm technology

• QDI + bundled data asynchronous digital logic

• Intel “Loihi” chip❖ ~24 pJ/synaptic

operation❖ Integrated on-chip

learning support❖ Microprocessors for

management

• 14nm technology

• QDI + bundled data asynchronous digital logic

• Stanford/Yale “Braindrop”❖ ~0.4 pJ/effective

synaptic operation❖ Support for “NEF”

programming model

• 28nm FDSOI

• QDI digital logic, synchronous I/O, and analog circuits for neurons and synapses

2014 2018 2019

AVLSI

Challenges: energy-efficiency

• Biological neural systems❖ ~ 20 fJ/synaptic operation

• TrueNorth/Loihi❖ ~ 20 pJ/synaptic operation

• How do we close the gap?❖ Many, many proposals (new devices, materials, etc…) for better synapses and

neurons❖ Reality‣ ~30-50% power is in spike communication/storage— Amdahl strikes again!‣ Best case: reduce to 7-10 pJ, even after overcoming all the technical

obstacles!❖ Many proposals with significantly lower energy reported‣ … but not for a system, just for small devices/components

AVLSI

Challenges: design and simulation

• ACT open-source language❖ Integrated representation of‣ properties‣ functional model ‣ behavioral description‣ gates‣ transistors‣ geometry

❖ Type system‣ relates different levels of abstraction‣ inheritance for extensible modules

• Integrated commercial event-basedand custom asynchronous digital simulator

Design

Expanded design

Technology mapping

New cell generation

Characterizer

Placement

Asynchronous static timing engine

Routing

Floorplan

Layout finishing

.lib

.act

.spice

.lef

.rect.act

.act .def

.def

.gds

Layout editor

.lef

.gds

.v

.v

.def

.spef .act

translation to proprietary commands

.def

cell layout

Prof. Keshav Pingali

part of the DARPA ERI initiative

AVLSI

0.00

0.55

1.10

1.65

2.20

-20.00

%

-10.00

%0.0

0%10

.00%

20.00

%30

.00%

40.00

%

Difference in delay

Perfo

rman

ce ra

tioComparison of two commercial circuit simulators

AVLSI

Challenges: programmability and algorithms

• How do we best utilize this computation model?❖ … in a general-purpose framework?

• What’s the right “programming language”?• Current solutions

❖ Use learning/training and artificial neural networks❖ Use hand-crafted solutions❖ Time-averaged spike rate is used to represent a value

𝛜 (bits)

Number of “spike slots”𝛅=0.05 𝛅=0.10 𝛅=0.25

1 28 20 8

2 176 126 56

3 848 592 288

4 3670 2582 1248

5 15211 10731 5227

maxv2[0,1]

{Prv̂[|v � v̂| > ✏]} �

|v � v̂| ✏

sender receiver

AVLSI

Summary

• Neuromorphic systems❖ Biologically inspired, naturally parallel approach❖ Various attempts to create programmable platforms

• Biological systems are an existence proof❖ … we need to better understand how they compute

• Challenges❖ What are efficient ways to compute in this framework?❖ How do we reduce the cost of communication and storage?❖ Is there a different abstraction, beyond simply emulating Biology?

AVLSI

Acknowledgments

• Major project collaborators❖ TrueNorth: Dharmendra Modha, John Arthur, Paul Merolla, …❖ Braindrop: Kwabena Boahen, Alex Neckar, Sam Folk, Ben Benjamin, …

• Group members❖ Filipp Akopyan, Nabil Imam, Saber Moradi, Carlos Otero, Jon Tse

• Many, many members of the neuromorphic community❖ Andreas Andreou, Gert Cauwenberghs, Tobi Delbruck, Shih-Chi Liu, …

• Sponsors❖ DARPA, ONR, AFRL‣ Todd Hylton

Documents

Architectures for Neuromorphic ComputingGoldenGate TrueNorth 4 0 100 200 300 400 500 600 0 50 100 150 200 250 Time (ms) Neuron Index Simulation Chip Fig. 5. The spiking dynamics of