Pattern Classification Memristive XbarXbar Circuits Circuitsstrukov/papers/2013/inteltalk2013.pdf · Pattern Classification with Memristive XbarXbar Circuits Circuits Dmitri Strukov

Pattern Classification with Pattern Classification with Memristive Memristive XbarXbar CircuitsCircuits

Dmitri Strukov

UC Santa Barbara

Acknowledgments: Fabien Alibart, Elham Zamanidoost, Brian Hoskins, Gina Adam, Farnood Merrikh‐Bayat, Xinjie Guo, Ligang Gao, Christof Teuscher, John

C th Ti Ch L k Th j S St K t ti LikhCarruthers, Tim Cheng, Luke Theogarajan, Susanne Stemmer, Konstantin Likharev

Funding: AFOSR MURI, AFOSR STTR‐II, NSF CDI

UNIVERISTY OF CALIFORNIASANTA BARBARA

Motivation: SuperVision with convolutional networks

A. Krizhevsky et al, ImageNet classification with deep convolutional neural networks, NIPS’12

650,000 neurons

60,000,000 parameters

630,000,000 synapses, , y p

Backpropagation learning rule

June 2013 2Intel, Portland



Implemented with GPUs




Problem: Concurrent state‐of‐art implementations are not suitable for real‐time and low energy operation

P d l ti H b id CMOS/ i t t k ( ll d CMOL Proposed solution: Hybrid CMOS/memristor networks (so‐called CMOL CrossNets)

Estimated performance for 64x64 image fragment

Implementation Propagationtime (s)

Power (W)

Energyper operation (J)

CPU 2.66 GHz [1] 810-3 30 to 40 ~310-1

GPU 1 GHz [1] 310-4 40 ~110-2[ ]FPGA 200 MHz [1] 1.510-4 10 ~110-3

ASIC 65 nm, 400 MHz [1] 510-5 ~3 ~110-4

CMOL CrossNet 90 nm [2] ~310-8 ~1 ~310-8

CMOL CrossNet 10 nm [2] ~210-8 ~0.1 ~210-9

[1] C. Farabet et al., Large‐scale FPGA‐based convolutional networks, in: Machine Learning on Very Large Data Sets, ed. by R. Bekkerman et al., Cambridge U. Press, 2001, pp. 399‐419

[2] K. Likharev, 2012 (unpublished)



Perceptron: Main idea

x1x

x4x

x7x x = +1x1

x

Bias, x0

w1

Single layer perceptron Binary pixel array

hw bottleneckx2x3

x5x6

x8x9

x = –1x2x3

w9

w1 w0

]sgn[9

0

i

ii xwy

x9w9

Considered training/test patterns

Pattern “X”, class d = +1Perceptron training rule: ∆wi = αxi(p)(d(p)‐y(p))

V

Crossbar implementation

V ∞ x G+-G- = G ∞ w

[I+ I ]

AI+

V0 V1 V9V2

G0+ G1

+ G2+ G9

+

Pattern “T”, class d = –1+ ‐

y = sgn[I+-I -]param. analyzer‐based

5June 2013

AI–G0– G1

– G2– G9

–

Alibart et al., Nature Comm, 2013

Intel, Portland


Windrow’s memistorAdaLiNe concept … … and hardware implementation

BernardWidrow

MarcianHoff

6

B. Widrow and M.E. Hoff, Jr., IRE WESCON Convention Record, 4:96 1960

June 2013 Intel, Portland


Pt/TiO2-x/Pt devicesg = I(0.2V)/ 0.2 V

25 nm Au / 15 nm Pt top electrode

1.0

)

=

Pt top electrode

5 nm Ti / 25 nm Pt bottom electrode

e‐beam patterned Pt protrusion

30 nm TiO2‐xS

0

rent (m

A)

20 nm

‐ Any state betweenON and OFF

‐ In principle dynamic

‐1.0

Curr S

A

V

‐ In principle dynamic system with frequencydependent loop size but ….

‐1.0 0 1.0Voltage (V)

A‐ Strongly (superexp)nonlinear switching dynamics

‐ Gray area = no changeVoltage (V)+Vswitch‐Vswitch

7June 2013

Gray area no change ‐ State defined within

gray area Alibart et al., Nature Comm, 2013

Intel, Portland


Switching dynamics

RESET: R =Rd

setvoltage initialize to R0FF

10

100

RESET: R0=RON

SET: R0=ROFF

reset

read

time initialize to R0N

1

10

R/R

0

‐ Small pulse amp = finer state change butmay require exp long time

‐ Large pulse amp faster but at cruder step

1E 8

0.11E-4

-0.9VmV

(A) -0.5V to -0.8V

1E-81E-6

1E-40.01

1

-1.5-1.0-0.5

0.00.5

1.01 5 Tim

e (s)

Pulse voltage (1E-5

-1.0V

-1.1V

-1.2V

-1.3V

Cur

rent

@ -2

00

1.5 Timge (V)

8

F. Alibart et al. Nanotechnology, 23 075201, 2012

0 1x10-5 2x10-5

Time (s)



Nonlinear switching dynamics

effective barrier modulation due to:

heating

electric field

1

2 ion hopping

e‐

ion hoping

z+z+e‐

electrodeelectrode

UA

~Eaq/2

~ kB∆T

initial profile

2

1

eoxidation reduction‐+ v

Eaq/2

energy a∆UA

h t iti d ti3

3

2

hop distance

position

phase transition or redox reaction3

June 2013 Intel, Portland 9

J. Yang et al. submitted 2012


Speed vs. retention

linear ionic transport linear ionic transport pp

TI

I

write

store ~)()0(

VV

DV

Vvv

nonnonlinearlinear effect due to temperature and/or electric field

)(~ writeB

A

storeB

A

store TkU

TkU

eeVV

e.g. temperature only:

Twrite V


D.Strukov et al. Appl.Phys.A 94 515 (2009)


Joule heating290K

140K

3K

15

10

5

0

I (mA)

ON

OFF

INTERMEDIATE

0

‐5‐1.0 ‐0.5 0.0 0.5 1.0

V (V) 600Domain fitted on dataExtrapolation

ON OFF

500

400m

pera

ture

(K)

SHORT

300Loca

l Tem

3020100I (mA)


J. Borghetti et al. JAP 106 124504 (2009)


Variations in switching behavior (I)

RESET SET

10-4

mV

(A)

10-4

0mV

(A)

10-5

urre

nt @

200

m

10-5

Cur

rent

@ 2

00

0.02.0x10-6

4.0x10-6

6.0x10-6

8 0x10-6

0.60.8

1.01.2

Cu

ve tim

e (s)

Voltag

0.0

5.0x10-7

1.0x10-6

1.5x10-6

-1.4-1.2

-1.0

ative

time (

s)

Voltage8.0x101.0x10-51.4

Cumula

tivetage (V)

5 0

2.0x10-6-0.8

-0.6 Cumula

tage (V)

10 TiO2‐x devices

12

Large switching dynamics dispersion!

June 2013


Intel, Portland


Variations in switching behavior (II)

101.0

g = I(0.2V)/ 0.2 V

10

g INIT

IAL

‐1.0

0

Curren

t (mA)

1

gAF

TER/g

write‐1.0 0 1.0

Voltage (V)SET

10 1

Syn

S =readtune

RESET

-10

1

0.1

1 ulse voltage (V)

ynaptic weight

gINITIAL (mS

SET1

Pulsht,mS)


13

RESET‐ Continuous state change



Tuning algorithmWrite

apply pulse VWRITE

Processing

VWRITE = VWRITE + sign * TVSTEPoldsign = sign

Processing

Is state reached

Start

(inputs: desired state Idesired, desired accuracy

A

Read

Processing

check for overshoot and set the i f i t i

within required precision, i.e. (Idesired – Icurrent)/ Idesired < Adesired ?

Adesired; initialize: write voltage to small non‐disturbing value VWRITE = 200 mV, voltage step TVSTEP = 10

V

(apply VREAD = 200 mV and read current Icurrent)

sign of increment, i.e. sign = Icurrent ‐ Idesired ;

if VWRITE !=VREAD and sign !=oldsign then initialize VWRITE =

200 mV

no

yes

Finish

mV;

Intuitive algorithm Implemented algorithmvoltage

0read

set timevoltage

0

set

time

Intuitive algorithm Implemented algorithm

14

resetread

resetread

non‐disturbing pulse F. Alibart et al. Nanotechnology, 23 075201, 2012



Perceptron experimental setup

Vt

Switching matrix( l )

Arbitrary waveform generator B1530

A

(Agilent E5250A)

Current measurementB1530 (fast IV mode)

Ground (GNDU, Agilent)

Agilent B1500

Wires implementing crossbar circuit

Agilent B1500

Chip packaged wire bonded memristive devices

15June 2013


Intel, Portland


Perceptron: Ex-situ trainings1

Evolution of synaptic conductance upon sequential tunings2

v s10 5

0.6

mS

)

+ tuning

final weights after programming

weight import accuracy ~10%

y p p q g

+ it

read pulse write pulse0.3

0.4

0.5

wei

ght,

g (m g+ tuning g ‐

123456

gi+, i

gs2

0 20 40 60 80 100 120 250 3000.0

0.1

0.2

Syn

aptic

w

weight slightly affected by half‐select problem

678910

v

t

+Vswitch

-Vswitch

v

t

voltage at g8- 0 20 40 60 80 100 120 250 300

Pulse number #

‐ Crossbar half‐select tricklf l d d i li h l ff d ( bi i i )switch

16

‐ Half‐selected devices slightly affected (>5‐bit precision)

June 2013


Intel, Portland


Perceptron: In-situ training

V t ra in = 1 VV t ra in = 0 .9 V

s1 s2

g1+ g4

+

Evolution of synaptic conductance upon parallel tuning

‐ Four steps‐ α (V g)

∆gi ± = ±αxi(d(p)‐y(p))

0 05

-0 .10 .00 .1

-0 .050 .00

g

g

s3s4g1

- g4-

s1=PSx=+1 voltage at g1+

‐ α (V, g)

0.000 .05

-0 .050 .000 .000 .05

g

(mS

)

g

g

g

+Vtrain/2v

t1 2 3 4

v

t

1 x=+1

s2=PS 1

voltage at g1

voltage at g1-

-Vtrain-Vtrain/2

0 1

-0 .20

-0 .15-0 .150 .000 .15

g

g

g

g

v

t

v

t

s2 PSx=‐1

s3=PS+d=+1

voltage at g1

voltage at g4+

0 00.1

-0 .15-0 .10-0 .05

0 .00 .1

g

g

g

v

t

v

t

3 d=+1 g g4

voltage at g4-s4=PS‐d=+1

0 4 8 1 2 1 6

0.0

T ra in in g e p o c h

v

t

v

t+Vswitch

-Vswitch

4 d 1

17June 2013


Intel, Portland

In situ Training Example +Vswitch

s +Vtrain/2

v

-Vswitch

s1t-Vtrain/2

s2t

s3t

s ss s

1 2 3 4

s4t

s1 s2s1 s2

1 2 3 4

G+

s3s3 STARPhasePhasePhasePhaseG‐

s4

33

s4

START

Phase 1

Phase 2

Phase 3

Phase 4



Software Simulation

200

V t ra in = 1 VV t ra in = 0 .9 V

Experiment vs. Simulation

16

18

20

w0

w1

0

0

0

0.000.05

-0 .10 .00 .1

-0 .050.00

mS)

g

g

g

10

12

14

w

w2

w3

w4

w50

0

0

-0 .150.000.15

0.000.05

-0 .050.00

g

g

g

g

(m

g

g

4

6

8 w6

w7

w8

w90

0

0

-0 .15-0 .10-0 .05

0 .00 .1

-0 .20

-0 .15

g

g

g

g

0 2 4 6 8 10 12

2

Training epoch

0

0 4 8 1 2 1 6

0 .00 .1

T ra in in g e p o c h

g

June 2013 19

Similar qualitative behavior: (1) smooth vs. sudden changes, (2) convergenceAlibart et al., Nature Comm, 2013

Intel, Portland


Results

10

XT

initialInitial (random

XT

initial

Ex‐situ In‐situ

0

10

accuracy ~ 40%

( a doweights)

weight import accuracy ~40% 0

10T

ns

0

10

of p

atte

rns

accuracy ~ 10%

accuracy ~40%

weight import

10

0

ber o

f pat

tern after 10 epochs

with Vtrain =0.9V

0

10

Num

ber o

accuracy ~ 2%

accuracy ~10%

weight import 10

0after 7 more epochs with Vtrain =1V

Num

b

0

10

accuracy 2%weight import accuracy ~2%

-0.0002 0.0000 0.00020

10

train

-0.0002 0.0000 0.0002I+ - I- (A) I+ - I- (A)

20

‐ 3‐bit is enough for considered taskJune 2013


Intel, Portland


Retraining Network

Vtrain 0.9V 1V 1.1V a bInversion of classes after full training

10

0

10

n #

X +1

T -1

INITIAL

3 0

3.5

4.0 G

G

G

G

G

class inversion

10

0

10

0

mbe

r of P

atte

rn

X +1T -1

1.5

2.0

2.5

3.0

G

G

(mS)

G

G

G

G

G

G

0

10

0

10

T +1X -1

Num X -1 T +1

0.0

0.5

1.0

1.1V

1V

G =

G

Vtrain

0.9V

-0.0002 0.0000 0.00020

10

0

I + - I - (A)

T +1X -1

0 10 20 30 40 50

Epoch #

Initial state matters!

June 2013 21

I I (A)


Intel, Portland

Initial state matters!


Big picture

add‐ongj1

weight memristor

CMOSstack

x

x1

x2 yjwj1

wj2

w

x1

x2

gj1

gj2

gj3

‐+

jii

i gxx3 wj3 x3

CMOS

i

CMOS cell

a input neurons (pixels)

output neurons

c interface pinmemristor

Tight integration with CMOS logic (CMOL)

‐ Example of mapping of 64 input / 9 output perceptron

neurons

crossbar add‐on

CMOSstack

g ( ) Multi‐layer perceptron

network

CMOS cellcrossbar wire

22June 2013 Intel, Portland


ADC and DAC Circuits 6-bit DAC 4-bit ADC (Hopfield Network)

1 2

1 .5

1 .8

0 .09

0 .18

V)

litud

e (V

)

2 5

3.0

3.5

Digital outputAnalog inputDigital outputAnalog input 1011

1111111011011100

0 .3

0 .6

0 .9

1 .2

0 1 2 3 4 5 60 .00

e xp e rim e n ta l re su lt

Am

plitu

de (V Am

p

In p u t co d e

0.5

1.0

1.5

2.0

2.5

Dig

ital C

ode

Vol

tage

(V)

00100011

0111011001010100

1000100110101011


0 1 0 2 0 3 0 4 0 5 0 6 00 .0

Inpu t code 0.0 0.2 0.4 0.6 0.8 1.00.0

Time (s)

00000001

L. Gao et al., NanoArch, 2013


Summary

Small scale pattern classification experimental demoexperimental demo

Small scale pattern recognition and mixed signal circuit experimental demo

Challenges: Device yield, variations, CMOS i t tiintegration

Work in progress: Work in progress:– CMOS integration– Large-scale system simulationg y


State-of-the-Art Performance 1E11

1E13SAIT

ycle

s)

HP LabsPerformance

1,000

100,000

1E7

1E9

1E11

Fujitsu Labs

Panasonic Corp.

Endr

uanc

e (c

y HP Labs

several groups2) endurance 10) density

1) reproducibility

2006 2007 2008 2009 2010 20111,000

YearGovoreanu,et all IEDM, 2012

ON

4) switching8) I‐V

3) reciprocal switching energy

9) number of states

OFF4) switching

speed

5) retention 6) ON/OFF

7) OFF stateresistance

8) I V nonlinearity

memorylogicneuro

storage

120AIncrease WeightDecrease Weight

Kawahara et al. Panasonic, 2012 Strachan et al, Nanotechnology 22505402 2011

J. Yang, DBS, and D. Stewart Nature Nano 8 13-24 (2013)

current ratio demonstrated

1E-4

60A

30A

urre

nt @

-200

mV

(A)

15A

gStand-by (Read only)

0 1000 2000 3000

1E-57A

Cu

Pulse NumberAlibart et al, Nanotechnology 23 074508, 2012

Schindler, PhD Thesis, 2009

Torrezan et al, Nanotechnology 22 485203 2011 June 2013 Intel, Portland 25


Hybrid CMOS/memristor demo (c) (d)(a) n anowire layer 2

(titanium) NOT gate

nanowire layer 1

m emristive layer AND gate

NOT gate

CMOS layer

(platinum)

NOT gate NAND gate

OR gate

(b )

AND gate

NOT gate

NAND gate

NOR gate

OR gate

D flip flop

Q. Xia et al. Nano Letters, 2009

g

NOR gate

D flip flop

26June 2013 Intel, Portland

Thank You!Thank You!Thank You!Thank You!

Email: [email protected]

Documents

Pattern Classification Memristive XbarXbar Circuits Circuitsstrukov/papers/2013/inteltalk2013.pdf · Pattern Classification with Memristive XbarXbar Circuits Circuits Dmitri Strukov