Upload
others
View
9
Download
0
Embed Size (px)
Citation preview
Pattern Classification with Pattern Classification with Memristive Memristive XbarXbar CircuitsCircuits
Dmitri Strukov
UC Santa Barbara
Acknowledgments: Fabien Alibart, Elham Zamanidoost, Brian Hoskins, Gina Adam, Farnood Merrikh‐Bayat, Xinjie Guo, Ligang Gao, Christof Teuscher, John
C th Ti Ch L k Th j S St K t ti LikhCarruthers, Tim Cheng, Luke Theogarajan, Susanne Stemmer, Konstantin Likharev
Funding: AFOSR MURI, AFOSR STTR‐II, NSF CDI
UNIVERISTY OF CALIFORNIASANTA BARBARA
Motivation: SuperVision with convolutional networks
A. Krizhevsky et al, ImageNet classification with deep convolutional neural networks, NIPS’12
650,000 neurons
60,000,000 parameters
630,000,000 synapses, , y p
Backpropagation learning rule
June 2013 2Intel, Portland
UNIVERISTY OF CALIFORNIASANTA BARBARA
Motivation: SuperVision with convolutional networks
Implemented with GPUs
June 2013 3Intel, Portland
UNIVERISTY OF CALIFORNIASANTA BARBARA
Motivation: SuperVision with convolutional networks
Problem: Concurrent state‐of‐art implementations are not suitable for real‐time and low energy operation
P d l ti H b id CMOS/ i t t k ( ll d CMOL Proposed solution: Hybrid CMOS/memristor networks (so‐called CMOL CrossNets)
Estimated performance for 64x64 image fragment
Implementation Propagationtime (s)
Power (W)
Energyper operation (J)
CPU 2.66 GHz [1] 810-3 30 to 40 ~310-1
GPU 1 GHz [1] 310-4 40 ~110-2[ ]FPGA 200 MHz [1] 1.510-4 10 ~110-3
ASIC 65 nm, 400 MHz [1] 510-5 ~3 ~110-4
CMOL CrossNet 90 nm [2] ~310-8 ~1 ~310-8
CMOL CrossNet 10 nm [2] ~210-8 ~0.1 ~210-9
[1] C. Farabet et al., Large‐scale FPGA‐based convolutional networks, in: Machine Learning on Very Large Data Sets, ed. by R. Bekkerman et al., Cambridge U. Press, 2001, pp. 399‐419
[2] K. Likharev, 2012 (unpublished)
June 2013 4Intel, Portland
UNIVERISTY OF CALIFORNIASANTA BARBARA
Perceptron: Main idea
x1x
x4x
x7x x = +1x1
x
Bias, x0
w1
Single layer perceptron Binary pixel array
hw bottleneckx2x3
x5x6
x8x9
x = –1x2x3
w9
w1 w0
]sgn[9
0
i
ii xwy
x9w9
Considered training/test patterns
Pattern “X”, class d = +1Perceptron training rule: ∆wi = αxi(p)(d(p)‐y(p))
V
Crossbar implementation
V ∞ x G+-G- = G ∞ w
[I+ I ]
AI+
V0 V1 V9V2
G0+ G1
+ G2+ G9
+
Pattern “T”, class d = –1+ ‐
y = sgn[I+-I -]param. analyzer‐based
5June 2013
AI–G0– G1
– G2– G9
–
Alibart et al., Nature Comm, 2013
Intel, Portland
UNIVERISTY OF CALIFORNIASANTA BARBARA
Windrow’s memistorAdaLiNe concept … … and hardware implementation
BernardWidrow
MarcianHoff
6
B. Widrow and M.E. Hoff, Jr., IRE WESCON Convention Record, 4:96 1960
June 2013 Intel, Portland
UNIVERISTY OF CALIFORNIASANTA BARBARA
Pt/TiO2-x/Pt devicesg = I(0.2V)/ 0.2 V
25 nm Au / 15 nm Pt top electrode
1.0
)
=
Pt top electrode
5 nm Ti / 25 nm Pt bottom electrode
e‐beam patterned Pt protrusion
30 nm TiO2‐xS
0
rent (m
A)
20 nm
‐ Any state betweenON and OFF
‐ In principle dynamic
‐1.0
Curr S
A
V
‐ In principle dynamic system with frequencydependent loop size but ….
‐1.0 0 1.0Voltage (V)
A‐ Strongly (superexp)nonlinear switching dynamics
‐ Gray area = no changeVoltage (V)+Vswitch‐Vswitch
7June 2013
Gray area no change ‐ State defined within
gray area Alibart et al., Nature Comm, 2013
Intel, Portland
UNIVERISTY OF CALIFORNIASANTA BARBARA
Switching dynamics
RESET: R =Rd
setvoltage initialize to R0FF
10
100
RESET: R0=RON
SET: R0=ROFF
reset
read
time initialize to R0N
1
10
R/R
0
‐ Small pulse amp = finer state change butmay require exp long time
‐ Large pulse amp faster but at cruder step
1E 8
0.11E-4
-0.9VmV
(A) -0.5V to -0.8V
1E-81E-6
1E-40.01
1
-1.5-1.0-0.5
0.00.5
1.01 5 Tim
e (s)
Pulse voltage (1E-5
-1.0V
-1.1V
-1.2V
-1.3V
Cur
rent
@ -2
00
1.5 Timge (V)
8
F. Alibart et al. Nanotechnology, 23 075201, 2012
0 1x10-5 2x10-5
Time (s)
June 2013 Intel, Portland
UNIVERISTY OF CALIFORNIASANTA BARBARA
Nonlinear switching dynamics
effective barrier modulation due to:
heating
electric field
1
2 ion hopping
e‐
ion hoping
z+z+e‐
electrodeelectrode
UA
~Eaq/2
~ kB∆T
initial profile
2
1
eoxidation reduction‐+ v
Eaq/2
energy a∆UA
h t iti d ti3
3
2
hop distance
position
phase transition or redox reaction3
June 2013 Intel, Portland 9
J. Yang et al. submitted 2012
UNIVERISTY OF CALIFORNIASANTA BARBARA
Speed vs. retention
linear ionic transport linear ionic transport pp
TI
I
write
store ~)()0(
VV
DV
Vvv
nonnonlinearlinear effect due to temperature and/or electric field
)(~ writeB
A
storeB
A
store TkU
TkU
eeVV
e.g. temperature only:
Twrite V
June 2013 Intel, Portland 10
D.Strukov et al. Appl.Phys.A 94 515 (2009)
UNIVERISTY OF CALIFORNIASANTA BARBARA
Joule heating290K
140K
3K
15
10
5
0
I (mA)
ON
OFF
INTERMEDIATE
0
‐5‐1.0 ‐0.5 0.0 0.5 1.0
V (V) 600Domain fitted on dataExtrapolation
ON OFF
500
400m
pera
ture
(K)
SHORT
300Loca
l Tem
3020100I (mA)
June 2013 Intel, Portland 11
J. Borghetti et al. JAP 106 124504 (2009)
UNIVERISTY OF CALIFORNIASANTA BARBARA
Variations in switching behavior (I)
RESET SET
10-4
mV
(A)
10-4
0mV
(A)
10-5
urre
nt @
200
m
10-5
Cur
rent
@ 2
00
0.02.0x10-6
4.0x10-6
6.0x10-6
8 0x10-6
0.60.8
1.01.2
Cu
ve tim
e (s)
Voltag
0.0
5.0x10-7
1.0x10-6
1.5x10-6
-1.4-1.2
-1.0
ative
time (
s)
Voltage8.0x101.0x10-51.4
Cumula
tivetage (V)
5 0
2.0x10-6-0.8
-0.6 Cumula
tage (V)
10 TiO2‐x devices
12
Large switching dynamics dispersion!
June 2013
Alibart et al., Nature Comm, 2013
Intel, Portland
UNIVERISTY OF CALIFORNIASANTA BARBARA
Variations in switching behavior (II)
101.0
g = I(0.2V)/ 0.2 V
10
g INIT
IAL
‐1.0
0
Curren
t (mA)
1
gAF
TER/g
write‐1.0 0 1.0
Voltage (V)SET
10 1
Syn
S =readtune
RESET
-10
1
0.1
1 ulse voltage (V)
ynaptic weight
gINITIAL (mS
SET1
Pulsht,mS)
Alibart et al., Nature Comm, 2013
13
RESET‐ Continuous state change
June 2013 Intel, Portland
UNIVERISTY OF CALIFORNIASANTA BARBARA
Tuning algorithmWrite
apply pulse VWRITE
Processing
VWRITE = VWRITE + sign * TVSTEPoldsign = sign
Processing
Is state reached
Start
(inputs: desired state Idesired, desired accuracy
A
Read
Processing
check for overshoot and set the i f i t i
within required precision, i.e. (Idesired – Icurrent)/ Idesired < Adesired ?
Adesired; initialize: write voltage to small non‐disturbing value VWRITE = 200 mV, voltage step TVSTEP = 10
V
(apply VREAD = 200 mV and read current Icurrent)
sign of increment, i.e. sign = Icurrent ‐ Idesired ;
if VWRITE !=VREAD and sign !=oldsign then initialize VWRITE =
200 mV
no
yes
Finish
mV;
Intuitive algorithm Implemented algorithmvoltage
0read
set timevoltage
0
set
time
Intuitive algorithm Implemented algorithm
14
resetread
resetread
non‐disturbing pulse F. Alibart et al. Nanotechnology, 23 075201, 2012
June 2013 Intel, Portland
UNIVERISTY OF CALIFORNIASANTA BARBARA
Perceptron experimental setup
Vt
Switching matrix( l )
Arbitrary waveform generator B1530
A
(Agilent E5250A)
Current measurementB1530 (fast IV mode)
Ground (GNDU, Agilent)
Agilent B1500
Wires implementing crossbar circuit
Agilent B1500
Chip packaged wire bonded memristive devices
15June 2013
Alibart et al., Nature Comm, 2013
Intel, Portland
UNIVERISTY OF CALIFORNIASANTA BARBARA
Perceptron: Ex-situ trainings1
Evolution of synaptic conductance upon sequential tunings2
v s10 5
0.6
mS
)
+ tuning
final weights after programming
weight import accuracy ~10%
y p p q g
+ it
read pulse write pulse0.3
0.4
0.5
wei
ght,
g (m g+ tuning g ‐
123456
gi+, i
gs2
0 20 40 60 80 100 120 250 3000.0
0.1
0.2
Syn
aptic
w
weight slightly affected by half‐select problem
678910
v
t
+Vswitch
-Vswitch
v
t
voltage at g8- 0 20 40 60 80 100 120 250 300
Pulse number #
‐ Crossbar half‐select tricklf l d d i li h l ff d ( bi i i )switch
16
‐ Half‐selected devices slightly affected (>5‐bit precision)
June 2013
Alibart et al., Nature Comm, 2013
Intel, Portland
UNIVERISTY OF CALIFORNIASANTA BARBARA
Perceptron: In-situ training
V t ra in = 1 VV t ra in = 0 .9 V
s1 s2
g1+ g4
+
Evolution of synaptic conductance upon parallel tuning
‐ Four steps‐ α (V g)
∆gi ± = ±αxi(d(p)‐y(p))
0 05
-0 .10 .00 .1
-0 .050 .00
g
g
s3s4g1
- g4-
s1=PSx=+1 voltage at g1+
‐ α (V, g)
0.000 .05
-0 .050 .000 .000 .05
g
(mS
)
g
g
g
+Vtrain/2v
t1 2 3 4
v
t
1 x=+1
s2=PS 1
voltage at g1
voltage at g1-
-Vtrain-Vtrain/2
0 1
-0 .20
-0 .15-0 .150 .000 .15
g
g
g
g
v
t
v
t
s2 PSx=‐1
s3=PS+d=+1
voltage at g1
voltage at g4+
0 00.1
-0 .15-0 .10-0 .05
0 .00 .1
g
g
g
v
t
v
t
3 d=+1 g g4
voltage at g4-s4=PS‐d=+1
0 4 8 1 2 1 6
0.0
T ra in in g e p o c h
v
t
v
t+Vswitch
-Vswitch
4 d 1
17June 2013
Alibart et al., Nature Comm, 2013
Intel, Portland
In situ Training Example +Vswitch
s +Vtrain/2
v
-Vswitch
s1t-Vtrain/2
s2t
s3t
s ss s
1 2 3 4
s4t
s1 s2s1 s2
1 2 3 4
G+
s3s3 STARPhasePhasePhasePhaseG‐
s4
33
s4
START
Phase 1
Phase 2
Phase 3
Phase 4
June 2013 18Intel, Portland
UNIVERISTY OF CALIFORNIASANTA BARBARA
Software Simulation
200
V t ra in = 1 VV t ra in = 0 .9 V
Experiment vs. Simulation
16
18
20
w0
w1
0
0
0
0.000.05
-0 .10 .00 .1
-0 .050.00
mS)
g
g
g
10
12
14
w
w2
w3
w4
w50
0
0
-0 .150.000.15
0.000.05
-0 .050.00
g
g
g
g
(m
g
g
4
6
8 w6
w7
w8
w90
0
0
-0 .15-0 .10-0 .05
0 .00 .1
-0 .20
-0 .15
g
g
g
g
0 2 4 6 8 10 12
2
Training epoch
0
0 4 8 1 2 1 6
0 .00 .1
T ra in in g e p o c h
g
June 2013 19
Similar qualitative behavior: (1) smooth vs. sudden changes, (2) convergenceAlibart et al., Nature Comm, 2013
Intel, Portland
UNIVERISTY OF CALIFORNIASANTA BARBARA
Results
10
XT
initialInitial (random
XT
initial
Ex‐situ In‐situ
0
10
accuracy ~ 40%
( a doweights)
weight import accuracy ~40% 0
10T
ns
0
10
of p
atte
rns
accuracy ~ 10%
accuracy ~40%
weight import
10
0
ber o
f pat
tern after 10 epochs
with Vtrain =0.9V
0
10
Num
ber o
accuracy ~ 2%
accuracy ~10%
weight import 10
0after 7 more epochs with Vtrain =1V
Num
b
0
10
accuracy 2%weight import accuracy ~2%
-0.0002 0.0000 0.00020
10
train
-0.0002 0.0000 0.0002I+ - I- (A) I+ - I- (A)
20
‐ 3‐bit is enough for considered taskJune 2013
Alibart et al., Nature Comm, 2013
Intel, Portland
UNIVERISTY OF CALIFORNIASANTA BARBARA
Retraining Network
Vtrain 0.9V 1V 1.1V a bInversion of classes after full training
10
0
10
n #
X +1
T -1
INITIAL
3 0
3.5
4.0 G
G
G
G
G
class inversion
10
0
10
0
mbe
r of P
atte
rn
X +1T -1
1.5
2.0
2.5
3.0
G
G
(mS)
G
G
G
G
G
G
0
10
0
10
T +1X -1
Num X -1 T +1
0.0
0.5
1.0
1.1V
1V
G =
G
Vtrain
0.9V
-0.0002 0.0000 0.00020
10
0
I + - I - (A)
T +1X -1
0 10 20 30 40 50
Epoch #
Initial state matters!
June 2013 21
I I (A)
Alibart et al., Nature Comm, 2013
Intel, Portland
Initial state matters!
UNIVERISTY OF CALIFORNIASANTA BARBARA
Big picture
add‐ongj1
weight memristor
CMOSstack
x
x1
x2 yjwj1
wj2
w
x1
x2
gj1
gj2
gj3
‐+
jii
i gxx3 wj3 x3
CMOS
i
CMOS cell
a input neurons (pixels)
output neurons
c interface pinmemristor
Tight integration with CMOS logic (CMOL)
‐ Example of mapping of 64 input / 9 output perceptron
neurons
crossbar add‐on
CMOSstack
g ( ) Multi‐layer perceptron
network
CMOS cellcrossbar wire
22June 2013 Intel, Portland
UNIVERISTY OF CALIFORNIASANTA BARBARA
ADC and DAC Circuits 6-bit DAC 4-bit ADC (Hopfield Network)
1 2
1 .5
1 .8
0 .09
0 .18
V)
litud
e (V
)
2 5
3.0
3.5
Digital outputAnalog inputDigital outputAnalog input 1011
1111111011011100
0 .3
0 .6
0 .9
1 .2
0 1 2 3 4 5 60 .00
e xp e rim e n ta l re su lt
Am
plitu
de (V Am
p
In p u t co d e
0.5
1.0
1.5
2.0
2.5
Dig
ital C
ode
Vol
tage
(V)
00100011
0111011001010100
1000100110101011
June 2013 23Intel, Portland
0 1 0 2 0 3 0 4 0 5 0 6 00 .0
Inpu t code 0.0 0.2 0.4 0.6 0.8 1.00.0
Time (s)
00000001
L. Gao et al., NanoArch, 2013
UNIVERISTY OF CALIFORNIASANTA BARBARA
Summary
Small scale pattern classification experimental demoexperimental demo
Small scale pattern recognition and mixed signal circuit experimental demo
Challenges: Device yield, variations, CMOS i t tiintegration
Work in progress: Work in progress:– CMOS integration– Large-scale system simulationg y
June 2013 24Intel, Portland
State-of-the-Art Performance 1E11
1E13SAIT
ycle
s)
HP LabsPerformance
1,000
100,000
1E7
1E9
1E11
Fujitsu Labs
Panasonic Corp.
Endr
uanc
e (c
y HP Labs
several groups2) endurance 10) density
1) reproducibility
2006 2007 2008 2009 2010 20111,000
YearGovoreanu,et all IEDM, 2012
ON
4) switching8) I‐V
3) reciprocal switching energy
9) number of states
OFF4) switching
speed
5) retention 6) ON/OFF
7) OFF stateresistance
8) I V nonlinearity
memorylogicneuro
storage
120AIncrease WeightDecrease Weight
Kawahara et al. Panasonic, 2012 Strachan et al, Nanotechnology 22505402 2011
J. Yang, DBS, and D. Stewart Nature Nano 8 13-24 (2013)
current ratio demonstrated
1E-4
60A
30A
urre
nt @
-200
mV
(A)
15A
gStand-by (Read only)
0 1000 2000 3000
1E-57A
Cu
Pulse NumberAlibart et al, Nanotechnology 23 074508, 2012
Schindler, PhD Thesis, 2009
Torrezan et al, Nanotechnology 22 485203 2011 June 2013 Intel, Portland 25
UNIVERISTY OF CALIFORNIASANTA BARBARA
Hybrid CMOS/memristor demo (c) (d)(a) n anowire layer 2
(titanium) NOT gate
nanowire layer 1
m emristive layer AND gate
NOT gate
CMOS layer
(platinum)
NOT gate NAND gate
OR gate
(b )
AND gate
NOT gate
NAND gate
NOR gate
OR gate
D flip flop
Q. Xia et al. Nano Letters, 2009
g
NOR gate
D flip flop
26June 2013 Intel, Portland
Thank You!Thank You!Thank You!Thank You!
Email: [email protected]