Memristor Crossbar Based Low Power Computingornlcda.github.io/neuromorphic2016/presentations/taha_dayton_nca.pdf · Memristor Crossbar Based Low Power Computing Tarek M. Taha Electrical

Tarek Taha

Memristor Crossbar Based

Low Power Computing

Tarek M. Taha

Electrical and Computer Engineering Department

University of Dayton

June 30, 2016

Tarek Taha

Areas of Research

Application acceleration:

Cognitive computing: Autonomous agent for UAVs and decision making

Cybersecurity

Neuromorphic applications:

Porting algorithms to IBM TrueNorth, and our internal neuromorphic architectures

Examples: cognitive agent, cybersecurity, image processing

Neuromorphic multicore architectures:

Digital CMOS (verified via FPGA implementation)

Memristor crossbar

Both learning and recognition

Specialized versions for: deep learning, cybersecurity, convolution networks, controls

Memristor devices:

SPICE Modeling

Fabrication

Device Modeling

Tarek Taha

Device SPICE Model

C. Yakopcic, T. M. Taha, G. Subramanyam, and R. E. Pino, "Memristor SPICE Model and Crossbar

Simulation Based on Devices with Nanosecond Switching Time," IEEE International Joint

Conference on Neural Networks (IJCNN), August 2013. [BEST PAPER AWARD]

0 0.1 0.2 0.30

0.5

1

Voltage (V)

Cu

rre

nt (m

A)

0 5 10 15-3

-1.5

0

1.5

3

4.5

6

Vo

lta

ge

(V

)

1 2 3 40

0.2

0.4

0.6

0.8

1

Cu

rre

nt (

A)

-2 -1 0-0.4

-0.3

-0.2

-0.1

0

0 5 10 15-0.5

-0.25

0

0.25

0.5

0.75

1

Cu

rre

nt (

A)

Current

Voltage

Voltage (V) time (s)-1.5 -1 -0.5 0 0.5 1 1.5

-40

-20

0

20

40

60

Voltage (V)

Cu

rre

nt (

A)

0 1 2 3 4 5 6 7 8-40

-30

-20

-10

0

10

20

30

40

50

60

time (s)

Cu

rre

nt (

A)

0 1 2 3 4 5 6 7 8-2

-1.5

-1

-0.5

0

0.5

1

1.5

2

2.5

3

Vo

lta

ge

(V

)

Current

Voltage

Univ of MichiganBoise State HP Labs

SPIC

E

Model

Act

ual

Devi

ce

Tarek Taha

Integrated into Sandia XYCE

The Xyce (TM) team is pleased to announce the release of Xyce (TM)

Version 6.4. This release fixes a number of bugs in Xyce (TM) 6.3

and includes improvements to existing features of Xyce (TM) 6.4. Please

see the Release Notes for a complete list of new features and enhancements.

Highlights for Xyce Release 6.4 include:

New Devices and Device Model Improvements

* VBIC version 1.3, 3- and 4-terminal variants (Q levels 11 and 12)

* MEXTRAM 504.11 with self-heating (Q level 505)

* New memristor device using the Yakopcic model

* Support for Reactive Power limits in the Power Grid Generator Bus model.

Wed 1/20/2016 1:02 PM

XYCE <[email protected]>

Xyce version 6.4 has been released.

Circuit Design

Tarek Taha

Memristor Based Neuron

Memristor crossbar emulates multiply-add operation in analog

domain.

VA

VB

β

β

VC

+

A

A

AI

SumI

V V-

OV

AAA VI

CCBBAASum VVVI

VVVDiff

TDiff

TDiff

O VV

VVV

,0

,1

AAA VI

AI

SumI

SumIV

Tarek Taha

Analog Memristor Classifier

A B C

x y

2 layer CLA network using analog memristor circuit

Based on memristor crossbars

Iteratively trained through MATLAB and SPICE

Each weight represented by two memristors (for both signs)

wA1

wA1

AB

β

β

+ -

β

β

C

+ -

+ - + - + -+ -

x y

+ - + - + -

Inputs

Bias

Neuron and

synapses

System Design

Tarek Taha

Multicore Neural Processor

NC = Neural Core

R = Router

NC

R

NC

R

NC

R

NC

R

NC

R

NC

R

NC

R

NC

R

NC

R

NC

R

NC

R

NC

R

NC

R

NC

R

NC

R

NC

R

Tarek Taha


NC = Neural Core

R = Router

Router

Neural Core

NC

R

NC

R

NC

R

NC

R

NC

R

NC

R

NC

R

NC

R

NC

R

NC

R

NC

R

NC

R

NC

R

NC

R

NC

R

NC

R

Tarek Taha


NC = Neural Core

R = Router

Router

Neural Core

NC

R

NC

R

NC

R

NC

R

NC

R

NC

R

NC

R

NC

R

NC

R

NC

R

NC

R

NC

R

NC

R

NC

R

NC

R

NC

R

Tarek Taha


NC = Neural Core

R = Router

Router

Neural Core

NC

R

NC

R

NC

R

NC

R

NC

R

NC

R

NC

R

NC

R

NC

R

NC

R

NC

R

NC

R

NC

R

NC

R

NC

R

NC

R

OutIn

Tarek Taha

Mixed-Signal Neural Processor NC

R

NC

R

NC

R

NC

R

NC

R

NC

R

NC

R

NC

R

NC

R

NC

R

NC

R

NC

R

NC

R

NC

R

NC

R

NC

R

Tarek Taha

Mixed-Signal Neural Processor NC

R

NC

R

NC

R

NC

R

NC

R

NC

R

NC

R

NC

R

NC

R

NC

R

NC

R

NC

R

NC

R

NC

R

NC

R

NC

R

Tarek Taha

Mixed-Signal Neural Processor

AB

β

β

+ -

β

β

C

+ -

+ - + - + -+ -

x y

+ - + - + -

Inputs

Bias

Neuron and

synapses

NC

R

NC

R

NC

R

NC

R

NC

R

NC

R

NC

R

NC

R

NC

R

NC

R

NC

R

NC

R

NC

R

NC

R

NC

R

NC

R

Tarek Taha

NC

R

NC

R

NC

R

NC

R

NC

R

NC

R

NC

R

NC

R

NC

R

NC

R

NC

R

NC

R

NC

R

NC

R

NC

R

NC

R

Dec

od

er

Pre-synapticneuron number

Input buffer

Output buffer

acc

×

acc

×

acc

×

acc

×

Pre-synaptic neuron inputs (from router)

Post-synaptic neuron outputs (to router)

Pre-synapticneuron value

Control Unit

+ + ++

(i, xi)

xi

iSRAM (Wij)

Activation LUT

Routing Switch

Output Buffer

Inp

ut

Bu

ffe

r

Memristor Array

System Comparison

Tarek Taha

NC

R

NC

R

NC

R

NC

R

NC

R

NC

R

NC

R

NC

R

NC

R

NC

R

NC

R

NC

R

NC

R

NC

R

NC

R

NC

R

Dec

od

er

Pre-synapticneuron number

Input buffer

Output buffer

acc

×

acc

×

acc

×

acc

×

Pre-synaptic neuron inputs (from router)

Post-synaptic neuron outputs (to router)

Pre-synapticneuron value

Control Unit

+ + ++

(i, xi)

xi

iSRAM (Wij)

Activation LUT

Routing Switch

Output Buffer

Inp

ut

Bu

ffe

r

Memristor Array

System Comparison

On-chip Learning

Tarek Taha

Backpropagation Circuit

Training

Unit (L2)

CB

βC

AAB

β

ββ

inp

uts

. . .

+ -

+ -

+ -

+ -

+ -

+ -

ctrl2ctrl1

∑ target_2

+

-

δL2,2

output_2

∑-

δL2,1

+target_1

output_1

Training

Unit (L1)

A B C

F1 F2

Tarek Taha


Training

Unit (L2)

CB

βC

AAB

β

ββ

inp

uts

. . .

+ -

+ -

+ -

+ -

+ -

+ -

ctrl2ctrl1

∑ target_2

+

-

δL2,2

output_2

∑-

δL2,1

+target_1

output_1

Training

Unit (L1)

A B C

F1 F2

A

A

AA

B

B

B

B

C

C

CC

β

β

DPj

VO

Tarek Taha


Training

Unit (L2)

CB

βC

AAB

β

ββ

inp

uts

. . .

+ -

+ -

+ -

+ -

+ -

+ -

ctrl2ctrl1

∑ target_2

+

-

δL2,2

output_2

∑-

δL2,1

+target_1

output_1

Training

Unit (L1)

A B C

F1 F2

VA

VB

β

β

VC

+ -

OV

Tarek Taha

Unsupervised ClusteringBefore Training After Training

Wisconsin

Iris

Wine

0.2 0.25 0.3 0.35 0.40.45

0.5

0.55

0.6

0.65

0.7

0.75

Benign

Malignant

-1 -0.5 0 0.5 1-1

-0.5

0

0.5

1

Benign

Malignant

0.8 0.81 0.82 0.830.5

0.55

0.6

0.65

0.7

Setosa

Versicolor

Virginica

-1 -0.8 -0.6 -0.4 -0.2 0-1

-0.9

-0.8

-0.7

-0.6

-0.5

-0.4

-0.3

Setosa

Versicolor

Virginica

0.2 0.25 0.3 0.350.5

0.55

0.6

0.65

0.7

Class1

Class2

Class3

-1 -0.5 0 0.50.2

0.4

0.6

0.8

1

Class1

Class2

Class3

x1

x2

x3

x4

x5

x6

1

n1

n2

n3

1

x1’

x2’

x3’

x4’

x5’

x6’

Layer L1 Layer L2

Tarek Taha

Anomaly Detection in Network Traffic

0 500 1000 1500 2000 2500 30000

0.2

0.4

0.6

0.8

1

1.2

Normal packet

Dis

tance b

etn

input

& r

econstr

uction

0 500 1000 1500 2000 25000

0.2

0.4

0.6

0.8

1

1.2

Anomalous packet

Dis

tance b

etn

input

& r

econstr

uction

0 20 40 60 80 1000

20

40

60

80

100

False detection (%)

Dete

ction r

ate

(%

)

96.6% of the anomalous packets detected

4% false positive detection

Tarek Taha

Deep Learning Architecture

. . .

Inp

ut Layer 1

module

Layer 2 module

Output layer module

Crossbar in use

Inactive crossbar

. . .

Inpu

t Layer 1 module

Layer 2 module

Output layer module

. . .

Inp

ut Layer 1

module

Layer 2 module

. . .

. .

.

Output

Output layer module

1. Pre-training layer 1

2. Pre-training layer 2

3. Supervised training of

the whole network.

β

Layer

tio

utp

ut

C

β

A

B

D

β

Layer i+1 output

Crossbar tiCrossbar i

Crossbar i+1

. . .. . .

Input

Layer 1

Layer i

Output

Layer n

Layer i+1

β

Layer

ti+

1 o

utp

ut

Crossbar ti+1

Laye

r ii

np

ut

Laye

r io

utp

ut

Layer i input

C’

A’

B’

D’

Layer i module

Layer i+1 module

Tarek Taha

Large Crossbar Simulation

0 5000 10000

0.985

0.99

0.995

Memristor

Pote

ntial acro

ss m

em

risto

r (V

)

0 5000 10000

0.985

0.99

0.995

Memristor

Pote

ntial acro

ss m

em

risto

r (V

)

SPICE 100×100

MATLAB 100×100

0 1 2 3 4

x 104

0.95

0.96

0.97

0.98

0.99

1

Memristor

Pote

ntial acro

ss m

em

risto

r (V

)

0 1 2 3 4

x 104

0.95

0.96

0.97

0.98

0.99

1

Memristor

Pote

ntial acro

ss m

em

risto

r (V

)

SPICE 200×200 MATLAB 200×200

0 2 4 6 8

x 104

0.9

0.92

0.94

0.96

0.98

1

Memristor

Pote

ntial acro

ss m

em

risto

r (V

)

0 2 4 6 8

x 104

0.9

0.92

0.94

0.96

0.98

1

Memristor

Pote

ntial acro

ss m

em

risto

r (V

)

SPICE 300×300 MATLAB 300×300

y1

.

.

.

Inp

uts

. . .

yN/2

VM=1 V

V1=1 V

yj

RRf

R+ -

+ -

Tarek Taha

Deep Learning

R R R

R R R

R R R

Bu

ffer

RISC

Bu

ffer

NC NC NC

NC NC NC

NC NC NC

. . .

. . .

(consumer of neural

system output)

RISC

85

90

95

100

Iris

(99, 51)

Wisconsin

(140, 60)

Wine

(118, 60)

Isolate

(6238,

1559)

KDD

(20000,

5000)

MNIST

(10000,

5000)

Recognition Accuracy

Memristive Matlab

Performance vs. Tesla K20

Energy eff. Speedup

MNIST 26,597 6.9

Isolate 12,822 4.6

KDD 239,435 3.0

Energy eff. Speedup

MNIST 314,299 41.0

Isolate 147,308 50.5

KDD 375,252 10.2

Training

Recognition

Related Projects

Tarek Taha

Convolution Neural Network

2 4 6 8 10 12

10

20

30

40

50

60

70

80

90

Bit Width of the D to A

Cla

ssific

ation A

ccura

cy (

%)

=0.001V

=0.01V

=0.1V

=0.5V

Input Image DSP Output

Mem Output: 3 bit Mem Output: 2 bit

α=0.01

yj-

. . .

σij+

σij-

-x2,N

x1,Nx2,N

. . .

x25,N

-x1,N

y1+

-x25,N

=y1-

y2+

y2-

y3+

y3-

yj+ yM

+

yM-

. . .

σij+

σij--x2,1

x2,1

. . .

x25,1

-x1,1

-x25,1

x1,1

xβ σβj

Memristors corresponding to the

1st kernel and the 1st input map

Memristors corresponding to the

72nd kernel and the 6th input map

Tarek Taha

Cybersecurity

0 500 1000 1500 2000 2500 30000

0.2

0.4

0.6

0.8

1

1.2

Normal packet

Dis

tance b

etn

input

& r

econstr

uction

0 500 1000 1500 2000 25000

0.2

0.4

0.6

0.8

1

1.2

Anomalous packet

Dis

tance b

etn

input

& r

econstr

uction

Anomaly DetectionSignature Based Detection

• Large collection of state machines

• 2.2 nW per rule

• 3.8 Gbps

Regex : user\s[^\n]{10}

Tarek Taha

Device Fabrication

Memristor device is based on a LiNbO3

switching oxide

IV curve shows repeatable switching

-3 -2 -1 0 1 2 3

-10

-5

0

5

10

Voltage (V)

Curr

ent

(mA

)

Pt

42 nm LiNbO3

Pt

Device Structure

Tarek Taha

SEM/TEM results

Top Electrode

Bottom Electrode

SiO2

Si

PtLiNbO3

Pt

Tarek Taha

Autonomous Agent

Cognitively Enhanced Complex Event Processing (CECEP)

Architecture:

Consists of the following central net-centric components:

− soaDM: an associative memory application that allows

agents to store and retrieve declarative knowledge.

− soaCDO: a knowledge representation and mining

application that allows agents to store and exploit domain

knowledge.

− Esper: a complex event processing framework that allows

agents to base actions on context assessment and

procedural knowledge.

Event Output StreamsIO “Adapters”

Acknowledgements

Research Engineers:

• Raqib Hasan, PhD

• Chris Yakopcic, PhD

• Wei Song, PhD

• Tanvir Atahary, PhD

Sponsors:

http://homepages.udayton.edu/~ttaha1

Doctoral Students:

• Zahangir Alom

• Hua Chen

• Chong Chen

• Rasitha Fernando

• Ted Josue

• Will Mitchell

• Yangjie Qi

• Nayim Rahman

Documents

Memristor Crossbar Based Low Power Computingornlcda.github.io/neuromorphic2016/presentations/taha_dayton_nca.pdf · Memristor Crossbar Based Low Power Computing Tarek M. Taha Electrical