HYBRID CMOS-MEMRISTIVE NEUROMORPHIC SYSTEMS MODELING AND ... · PDF fileHYBRID CMOS-MEMRISTIVE NEUROMORPHIC SYSTEMS MODELING AND ... experience on neural networks and giving me

HYBRID CMOS-MEMRISTIVE NEUROMORPHICSYSTEMS MODELING AND DESIGN

Irem Boybat

Master Thesis (2015)

submitted to

École Polytechnique Fédérale de Lausanne (EPFL)School of Engineering (STI)Electrical and Electronic Section

supervised by

Prof. Yusuf LeblebiciTugba DemirciStanislaw Wozniakof Microelectronic Systems Laboratory (LSM)

Lausanne, 17 July 2015

AcknowledgementsFirst, I would like to thank Prof. Yusuf Leblebici for giving me the opportunity to work on

my master thesis at LSM, for his guidance and his support. I also would like to thank my

supervisors Tugba Demirci and Stanislaw Wozniak for sharing their valuable experiences and

and their assistance. Tugba Demirci guided me on various aspects of the project such as

algorithm development and hardware compatibility with her vast knowledge. Her support

motivated me greatly throughout my project. I am thankful to Stanislaw Wozniak for his deep

experience on neural networks and giving me new ideas for this thesis.

I would like to thank Geoffrey W. Burr, Robert M. Shelby, Pritish Narayanan, Kumar Virwani,

Carmelo di Nolfo, Wayne Imaino and Bulent Kurdi from IBM Research–Almaden for providing

me vast amount of knowledge on neural networks and non-volatile memory devices. Special

thanks to Geoffrey W. Burr, who guided me patiently and shared his extensive knowledge.

Furthermore, I would like to acknowledge Jury Sandrini and the rest of the ReRAM team for

their support. I am thankful to all the scientists, PhD and Master students at LSM for providing

a productive and friendly working environment.

Finally, I want to express my gratitude to my parents Ferhan and Savas Boybat as well as my

fiancé Kaan Kara for their endless love and support at every stage of my life. I wouldn’t be the

person I am today without them.

Lausanne, 17 July 2015 I. B.

i

AbstractResearch interest has turned to whether new architectures might eliminate the Von Neumann

bottleneck. A very efficient yet non-Von Neumann architecture is at work inside each of us -

the human brain. Artificial neural networks, motivated by the vast networks of neurons and

synapses found in the brain, are actively being researched.

This thesis proposes a hardware-compatible learning algorithm using artificial neural net-

works (ANNs) for a hybrid-CMOS memristive neuromorphic system. CMOS computational

units resembling the neurons in the brain will be connected in a dense crossbar array of

non-volatile resistive random access memory (ReRAM) devices imitating the synapses. Online

learning is done based on spike-timing-dependent plasticity (STDP).

First, a single layer network with 784 input neurons and variable number of output neurons

is presented inspired by an existing method in literature. After unsupervised training with

60,000 training examples of MNIST set of handwritten digits for 3 epochs, this network of

size 784 x 300 can recognise 78.21% of the training images during training and 82.93% of

the images when tested. A supervised layer is added to the network for labeling and shown

that the accuracy curve of both layers exhibit a similar trend. A multilayer network is then

proposed consisting of two unsupervised layers and its functionality is analyzed. Furthermore,

modifications to the learning algorithm are made for better hardware-compatibility and a

circuit design is provided for the hardware implementation of the system. Quantization of

weights and probabilistic weight updates are incorporated into the algorithm to reflect the

intermediate states and unpredictable nature of ReRAM devices more realistically.

Key words: neuromorphic computing, artificial neural network (ANN), online learning, spike-

timing-dependent plasticity (STDP), resistive random access memory (ReRAM)

iii

ContentsAcknowledgements i

Abstract iii

List of figures vii

List of tables ix

1 Introduction 1

2 Neuromorphic Computing 3

2.1 Brief History of Neuromorphic Computing and Neural Networks . . . . . . . . . 3

2.2 Classes of Learning Algorithms in Neural Networks . . . . . . . . . . . . . . . . . 12

2.3 A Bio-Inspired Learning Algorithm: Spike-Timing-Dependent Plasticity . . . . . 13

3 Memristors and Their Use in Artificial Neural Networks 15

3.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

3.2 Resistive Random Access Memory . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

3.3 Memristors in Artificial Neural Networks . . . . . . . . . . . . . . . . . . . . . . . 19

4 Learning in a Hybrid CMOS-Memristive Neuromorphic System 25

4.1 Development of the Learning Algorithm . . . . . . . . . . . . . . . . . . . . . . . 25

4.1.1 Single Layer Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

4.1.2 Single Layer Network with Labeling . . . . . . . . . . . . . . . . . . . . . . 31

4.1.3 Multilayer Network with two Unsupervised Layers and Labeling . . . . . 32

4.2 Circuit Design and Modifications to the Learning Algorithm for Better Hardware

Compatibility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

5 Simulation Results and Discussions 37

5.1 Simulation of the Single Layer Network . . . . . . . . . . . . . . . . . . . . . . . . 37

5.2 Simulation of the Labeling Layer . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

5.3 Simulation of the Multilayer Network . . . . . . . . . . . . . . . . . . . . . . . . . 42

5.4 Modifications to the Learning Algorithm for Better Hardware Compatibility . . 43

6 Conclusion and Future Work 47

Bibliography 51

v

List of Figures2.1 A simple MCP neuron and its use as an OR gate. . . . . . . . . . . . . . . . . . . . 4

2.2 The perceptron and properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2.3 Schematic of ADALINE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2.4 Architecture of Neocognitron . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2.5 A comparison of feed-forward and Hopfield network . . . . . . . . . . . . . . . . 9

2.6 A multilayer network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

2.7 Long-term potentiation (LTP) and long-term depression (LTD) by Bi and Poo . 13

2.8 Different examples of synaptic plasticity . . . . . . . . . . . . . . . . . . . . . . . 14

3.1 Two-terminal circuit elements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

3.2 Memristor model by HP Labs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

3.3 A simple Resistive RAM cell . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

3.4 I-V curve of 10µm TiN/Ta/TaO2/TiN-based ReRAM . . . . . . . . . . . . . . . . . 18

3.5 Resistive levels of a Pt/TaOx/CrOy/Cr-based ReRAM . . . . . . . . . . . . . . . . 19

3.6 Proposed circuit architecture and spike shapes by Querlioz, Bichler and Gamrat 20

3.7 Proposed device model and spike shapes by Sheridan, Ma and Lu . . . . . . . . 22

3.8 Proposed spike shapes and multilayer architecture by Afifi, Ayatollahi and Raissi 24

4.1 Initialization of weights in the network with 10 output neurons . . . . . . . . . . 26

4.2 Input images provided to the network . . . . . . . . . . . . . . . . . . . . . . . . . 27

4.3 Learning in the crossbar array . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

4.4 Proposed circuit design for the learning algorithm . . . . . . . . . . . . . . . . . 33

4.5 Weight update using different quantization techniques . . . . . . . . . . . . . . 35

5.1 Accuracy after training and testing in comparison to Querlioz, Bichler and Gamrat 38

5.2 Effects of different weight initialization techniques on accuracy . . . . . . . . . 40

5.3 Visualisation of weights after training . . . . . . . . . . . . . . . . . . . . . . . . . 41

5.4 Accuracy with unsupervised and supervised layers for different network sizes . 42

5.5 Multilayer network of size 784 x 300 x 10 . . . . . . . . . . . . . . . . . . . . . . . . 43

5.6 Accuracy using different quantization levels with a network of size 784 x 50 . . . 44

5.7 Illustration of probabilistic quantized weight variation . . . . . . . . . . . . . . . 45

5.8 Quantized weight variations in a network of size 784 x 50 with 64 quantized levels 45

5.9 Weights and distribution of weights after training a network of size 784 x 50 . . 46

vii

List of Tables3.1 Comparison of recognition rates in articles with respect to number of output

neurons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

5.1 Single layer network accuracies for different network sizes . . . . . . . . . . . . . 37

5.2 Labeling layer accuracies for different network sizes . . . . . . . . . . . . . . . . 41

ix

1 Introduction

One of the weak points in the Von Neumann architecture is the bottleneck through which

data is transferred between memory and the CPU. Recently, research interest has turned to

whether new architectures might eliminate this Von Neumann bottleneck. A very efficient

yet non-Von Neumann architecture is at work inside each of us - the human brain. Artificial

neural networks (ANNs), motivated by the vast networks of neurons and synapses found in

the brain, are actively being researched.

To understand the complex structure of the brain and to develop brain-inspired systems are

some of the main objectives of the Human Brain Project [21]. Two large-scale systems with

custom hardware are being build as a part of this project. One of the systems, Neuromorphic

Physical Model, is based in Heidelberg, Germany and currently contains a single 8-inch silicon

wafer of 200,000 neurons and 50 x 106 plastic synapses using 180 nm CMOS technology. The

second system,named Neuromorphic many-core system and centered in Manchester, United

Kingdom, is built with 18 ARM cores with each core simulating 16,000 neurons with 8 million

plastic synapses.

Another brain-inspired custom hardware was introduced by the SyNAPSE group at IBM Re-

search [31]. This power-efficient and modular chip contains 1 million neurons and 256 million

synapses and is implemented using 5.4 billion transistors. The training of the synapses are

done offline and transferred to the on-chip SRAMs. However, there might be some applications

that require online learning.

A different approach to neuromorphic computing is to use newly developed non-volatile

devices such as resistive random access memory (ReRAM) and phase change memory (PCM).

Building dense crossbar arrays using these small devices and storing multiple bits of in-

formation due to analog nature of the elements enable on-chip learning. A work that has

demonstrated online learning using the well-known backpropagation algorithm and PCM as

the synaptic weight-element on a large scale neural network of 165,000 synapses [8]. Yet, such

a complex learning algorithm might not reflect the actual process of learning in the brain.

1

Chapter 1. Introduction

A ongoing project at LSM aims at building a hybrid-CMOS memristive neuromorphic system

using non-volatile ReRAM and training the system using bio-inspired learning algorithms.

CMOS computational units resembling the neurons in the brain will be connected in a dense

crossbar array of ReRAM devices imitating the synapses of the brain. This master thesis

contributes to this project by developing a hardware-compatible learning algorithm using

ANNs that would enable online learning in such a system. The learning is done based on

spike-timing-dependent plasticity (STDP), which is believed by many researchers to be the

learning mechanism of the brain.

First, a single layer network with 784 input neurons and variable number of output neurons

is presented based on the algorithm proposed by Querlioz, Bichler and Gamrat [34]. After

unsupervised training with 60,000 training examples of MNIST set of handwritten digits for

3 epochs, this network of size 784 x 300 can recognise 78.21% of the training images during

training and 82.93% of the images when tested. A supervised layer is added to the network

for labeling and shown that the accuracy curve of both layers exhibit a similar trend. A multi-

layer network is then proposed consisting of two unsupervised layers and its functionality is

analyzed. Furthermore, modifications to the learning algorithm are made for better hardware-

compatibility and a circuit design is provided for the hardware implementation of the system.

Quantization of weights and probabilistic weight updates are incorporated into the algorithm

to reflect the intermediate states and unpredictable nature of ReRAM devices more realistically.

This thesis is organized in 6 chapters.

In Chapter 2, information about neuromorphic computing is provided. The history of neu-

romorphic computing and different classes of learning algorithms in neural networks are

presented. Then, a bio-inspired learning algorithm is explained in more detailed.

In Chapter 3, the definition of a memristor is provided. Resistive random access memory

(ReRAM) is presented and examples of ReRAM based neural networks are explored.

In Chapter 4, learning algorithms for single layer and multilayer network structures of a hybrid-

CMOS memristive neuromorphic system are presented. A circuit design with necessary blocks

and components is introduced for the hardware implementation and modifications to the

learning algorithms for better hardware compatibility are explained in detail.

In Chapter 5, the simulations results of the proposed networks and their discussions are

presented.

In Chapter 6, the conclusion of the work and a glance at the future work is provided.

2

2 Neuromorphic Computing

Neuromorphic computing is an interdisciplinary field bringing together the knowledge from

different areas such as neuroscience, mathematics, computer science and engineering to

form systems that resemble the architecture and functions performed by the human brain.

Motivated by the biological neural networks, artificial neural networks have been studied

intensely by researchers for decades.

2.1 Brief History of Neuromorphic Computing and Neural Networks

McCulloch-Pitts Neuron

In 1943, Warren S. McCulloch & Walter Pitts tried to understand how the neural system

functioned by proposing a simple mathematical neuron model, known as the McCulloch-Pitts

neuron[4, 27]. McCulloch-Pitts neuron (MCP) is a logical unit with a fixed threshold θ (Fig.

2.1). The inputs coming from excitatory synapses are binary and have identical weights. These

excitatory inputs are added and if the sum exceeds a threshold, the output becomes a 1 and

the neuron is active. Otherwise, the neuron is inactive and its output is 0. The neuron can

also receive inputs from inhibitory synapses. If the input from any of the inhibitory synapses

are 1, then the neuron produces a 0 as the output and is inactive. With this neuron model,

they demonstrated that simple processing units can perform more complex operations when

combined [42].

Hebbian Learning

At the end of the decade, a psychologist named Donald O. Hebb developed a theory on

biological learning that had an influence on psychology and neuroscience [7, 15]. In his book

"The Organization of Behaviour", he described his theory about how learning takes place in

synapses as in the following:

3

Chapter 2. Neuromorphic Computing

Inputs Output

x1

x2

θ

(a) A simple MCP neuron with two excitatory inputs

x1 x2 output0 0 0

0 1 1

1 0 1

1 1 1

θ = 1

(b) MCP neuron functioningas an OR gate when thresh-old θ is 1

Figure 2.1 – A simple MCP neuron and its use as an OR gate.

When an axon of cell A is near enough to excite a cell B and repeatedly and

persistently takes part in firing it, some growth process or metabolic change takes

place in one or both cells such that A’s efficiency, as one of the cells firing B, is

increased.

This states that the activity in the pre-synaptic and the post-synaptic neuron contribute to the

strengthening of the synapse. Derived from his theory, new concepts arised. For example, this

learning rule is known as Hebbian learning and the synapses which follow this this rule are

known as Hebb synapses. The foundings by Hebb would be studied further in the future by

many researchers and more complex models of learning would be developed.

Hodgkin-Huxley Model

Another important contribution for further understanding the biological process of learning

came from the work of Alan L. Hodgkin and Andrew F. Huxley. In 1952, they developed a

mathematical model which describes the electrical behaviour of the membrane [16]. By

conducting experiments on the giant squid axon, they found that current carried through the

membrane is a result of either the membrane capacity or the movement of ions through the

resistive membrane channel. The ionic current is made of sodium ions, potassium ions and

a leakage current of chloride and other ions. The Hodgkin-Huxley equations led the way to

more detailed neuron models and provide information about spike generation in neurons

by ion channels [13]. The Nobel Prize in Physiology or Medicine is awarded to Hodgkin and

Huxley for their work in 1963.

Perceptron

In 1958, a psychologist Frank Rosenblatt proposed an influential neural network, which is

the perceptron [36]. More general computational elements have been developed after the

4

2.1. Brief History of Neuromorphic Computing and Neural Networks

McCulloch-Pitts neuron and one of these elements existed in literature at that time is the

thresholded logic unit (TLU) [4](Fig. 2.2a). A TLU consists of n inputs and n synaptic weights.

An inner product between the inputs and the synaptic weights are computed and then passed

through a function f. This function is a sign function and defined as the following:

f (x) =

1 if (n∑

i=1xi wi +b) > 0

−1 otherwise(2.1)

The output of the TLU can only take two values, being -1 and +1. Instead of comparing the

output of the function with a threshold, a bias is inserted into the model as the 0th weight and

the 0th input of +1 is then multiplied with this weight and added to the sum.

Inputs Output

x1

x3

Weights

x2

x4

xn

w1w2w3w4

wn

y

x0 = +1 w0 = b





increased.







came from the work of Hodgkin and Huxley. In 1952, they developed a mathematical model

which describes the electrical behaviour of the membrane [Hodgkin and Huxley(1952)]. By



resistive membrane channel. The ionic current is made of sodium ions, potassium ions and a

leakage current of chloride and other ions. The Hodgkin-Huxley equations lead the way to


by ion channels [Gerstner and Kistler(2002)]. The Nobel Prize in Physiology or Medicine is

awarded to Hodgkin and Huxley for their work in 1963.

In 1958, a psychologist Frank Rosenblatt proposed the perceptron.

y = f (nX

i=0xi wi )

2.2 Artificial Neural Network (ANN)

2.2.1 Supervised and Unsupervised Learning

2.3 Bio-Inspired Learning Algorithms in ANNs

4

(a) A threshold logic unit (TLU)

Retina of Sensory Units Associator

Units

Response

Units

(b) A simplified perceptron

xxx

x

x

xx

xxx

x

xx

x

xx

xx

xx

x ooo

oo

o

oo

ooo o

o

ooo

oo o

oo o

oo

oo

oo

o

o

o

o o

o

oo

oo

o

o

o

ox

x

x

x

x

x

x

x

x

x

x

x

x x

x

x

x

xx

x

x

Linearly Separable Not Linearly Separable

(c) Linear separability

0 1

0

1

x1

x2

0 1

0

1

x1

x2

OR XOR

(d) Linear separability of OR and XOR

Figure 2.2 – The perceptron and properties

5


Rosenblatt proposed a model of the perceptron using the TLU units. He and his colleagues

worked on models consisting of several layers of TMUs as well as complex interconnections

including feedback connections. Because of the mathematical complexity of the model,

simplified feed-forward perceptrons are mostly used for analysis (Fig. 2.2b) [4]. An example

use of the perceptron is the classification of two categories. Because the output is binary, the

outputs +1 and -1 can be used for this task. A hyperplane is defined where the sum of the inner

product is equal to 0. By comparing this sum by 0, the points above and below the threshold

can be detected and classification can be achieved for linearly separable groups (Fig. 2.2c).

To implement classification and other tasks, the perceptron should perform learning. There

are various learning algorithms that can be implemented for the perceptron. A simple learn-

ing algorithm requires the weights between the associator unit and the response unit to be

changed during learning. It is assumed that the connections from the retina to the associator

units are fixed. The output of a TLU is binary; the output can be either correct or incorrect. Let

us assume that the output of the network is +1 (-1). If the output is correct, no changes are

done on the connections of the perceptron. If the output is incorrect, then the connections

from the associator unit to the response unit which have a positive (negative) contribution to

the overall sum are strengthened or increased by adding a constant amount c to their weights.

The connections from the associator unit which have a negative (positive) contribution on

the sum are weakened by subtracting c from their weights. This way, the perceptron adjusts

its weights and makes it more likely to learn the pattern that is shown. It is proven that if a

linearly separable solution is possible, then the perceptron can find the solution [4].

A problem that arises from the perceptron is concerning its speed.The perceptron updates

its connections when there is an incorrect output. At the final stages of learning where the

majority of the examples are classified as correct, the perceptron continues to learn very slowly.

Another problem is that the perceptron converges to only one of the possible solutions when

classifying two linearly separable groups. Among the set of all possible solutions, there might

be better solutions where the hyperplane separating the group has larger distance from the

groups. The perceptron however might find any solution in the solution set.

Limitations of the Perceptron

In 1969, Marvin Minsky and Seymour Papert demonstrated the limitations of the perceptron

in their book Perceptrons [32]. Part of their work showed that the perceptron can be used to

classify linearly separable groups and can solve the OR problem. However, because it is not a

linear separable problem, the perceptron is not capable of computing XOR (Fig. 2.2d). To be

able to perform this operation, another TMU unit can be added in an additional layer which

takes the logical products of two units. This way, a plane in three dimensional space can be

used to solve the XOR problem. However, this approach requires an additional layer to solve

the problem. This additional layer, which is not connected to the input nor is an output is

named as the hidden layer. The multilayer perceptrons with hidden layers can be used to

6


solve not linearly separable problems.

ADALINE and the Widrow-Hoff Algorithm

Another computational unit was proposed by Bernard Widrow and Marcien E. Hoff in 1980 [47].

The Adaptive Linear Neuron (later became Adaptive Linear Element), ADALINE consisted

of units called adaptive neurons. The adaptive neuron is the threshold logic unit like the

perceptron. However, the computation of the inner product and the binarization of the output

through a sign function takes place in different stages (Fig. 2.3). Between those stages, using

the inner product and the desired output, the error is computed using least mean squares

(LMS) algorithm. Because the error can be both positive and negative, the square of the error

is used as a measure. This method enables that the weights to be updated even if the output

of the system is correct, unlike the perceptron. This was one of the reasons why perceptron

learning was slow.

The goal of learning in ADALINE is to minimize the total error for all the input patterns. This

is done with the gradient technique and the gradient enables to change the weights in such

a way that the system moves in the direction of the steepest descent in the error surface. In

a simple network, because the minimum error in the weight space is unique, there is only

one global minimum in the system and the gradient can find that global local minimum. This

technique of error correction in ADALINE, which uses the difference between the desired

output and the correct output, is known as the Widrow-Hoff procedure, the LMS algorithm or

the delta method [4].

Inputs Output

x1

x3

Weights

x2

x4

xn

w1

w2

w3

w4

wn

y

x0 = +1 w0 = b





increased.







came from the work of Hodgkin and Huxley. In 1952, they developed a mathematical model

which describes the electrical behaviour of the membrane [Hodgkin and Huxley(1952)]. By



resistive membrane channel. The ionic current is made of sodium ions, potassium ions and a

leakage current of chloride and other ions. The Hodgkin-Huxley equations lead the way to


by ion channels [Gerstner and Kistler(2002)]. The Nobel Prize in Physiology or Medicine is

awarded to Hodgkin and Huxley for their work in 1963.

In 1958, a psychologist Frank Rosenblatt proposed the perceptron.

y = f (nX

i=0xi wi )

2.2 Artificial Neural Network (ANN)

2.2.1 Supervised and Unsupervised Learning

2.3 Bio-Inspired Learning Algorithms in ANNs

4

Stage 2:

Quantizer

Stage 1:

Summer

+1

-1

Error

computation

with LMS


converge to the local minimum when given noisy or incomplete states to the network. Thus,

this system can model a content-addressable memory [14] [2, p. 401-409].

(a) A example feed-forward network with four neu-rons

(b) An example Hopfield network with four neu-rons

Figure 2.4 – A comparison of feed-forward and Hopfield network

In California Institue of Technology (Caltech) in 1982, Mead, John Hopfield and Richard Feyn-

man taught a course named Physics of Computation. Inspired by this, Carver Mead came

forward with the idea of implementing neural computation using analog circuits and pub-

lished his book Analog VLSI and Neural Systems in 1989 [3, p. 5-6]. In one of his articles, he

expresses that "[t]here is nothing that is done in the nervous system that we cannot emulate

with electronics if we understand the principles of neural information processing" and uses the

phrase neurmorphic systems to describe systems built on the organization principles of the

nervous system [15]. Mead’s book, the Physics of Computation course and the Telluride Neuro-

morphic Engineering Workshop, which started in 1994, all contributed to the establishment of

Neuromorphic Engineering as a field [16] .

f () (2.4)

2.2 Learning paradigms in Neural Networks

2.3 Bio-Inspired Learning Algorithms in Neural Networks

8

Desired output

Error

Figure 2.3 – Schematic of ADALINE

7


Neocognitron

Inspired by the research done on visual nervous system by Hubel and Wiesel [18, 19, 20],

Kunihiko Fukushima in 1980 proposed a new hierarchical neural network model which he

named neocognitron [12]. This network can recognise visual patterns and is robust to position

changes and shape distortions, because the pattern recognition is build on geometric simi-

larities of shapes. He describes the network as self-organized because the network can learn

without any corrections inputted to the network. The neucognitron consists of an input layer

connected to the photoreceptors of the retina and hierarchical layers of S-cells (simple cells or

lower order hypercomplex cells) and C-cells (complex cells or higher order hypercomplex

cells). Only the connections to the S-cells are modifiable. Each layer consists of cells and each

cell receives a small area of the previous cells as input. The cells become sensitive to features

in the input pattern. In earlier stages, cells specialise in finer and more local features and in

next stages, more global features can be detected. The C-cells’ response is less affected by a

change in position of the input pattern. A model of the neucognitron is displayed in Figure 2.4.

These type of networks are named Convolutional Neural Networks and researchers continue

to work on these types of networks today.

Layer 0 Layer S1 Layer C1 Layer S2 Layer C2 Layer S3 Layer C3

k1 = 1

k1 = K1

k2 = 1

k2 = K2

k3 = 1

k3 = K3

Figure 2.4 – Architecture of Neocognitron

Hopfield Network

Another type of neural network was introduced by John J. Hopfield in 1982 [17]. The neurons

of the Hopfield network has two output states as the McCulloch-Pitts neuron. The state V of

a neuron i of the network is found by the Eq. 2.2. In this equation, the matrix T denotes the

connection matrix, which is the strength of the connections between neuron i and another

neuron j of the network. If there is no connection between the neuron i and j, the strength is

automatically 0 for that connection. The neuron does not have a connection to itself. If not

specified otherwise, the threshold Ui is 0. The calculation of the connection matrix, which is

done by using the states, is shown in Eq. 2.3 where V s is the set of states with s = 1...n. Multiple

8


patterns are stored in the network when n > 1.

Vi =

1 if∑j 6=i

Ti j V j ÊUi

0 other wi se(2.2)

Ti j =∑

s(2V s

i −1)(2V sj −1) (2.3)

∆E =−∆Vi∑

j 6=iTi j V j (2.4)

The input-output relationship of neurons are nonlinear. The states of the network are changed

asynchronously and one state is chosen randomly at a time. Assuming that T is symmetrical, T

and V can then be used to find the energy. The energy of a unit i can be calculated with Eq. 2.4.

Change in states will result in a decrease of energy and this will continue until a local minimum

for energy is found. This property can be used to converge to the local minimum when given

noisy or incomplete states to the network. Thus, this system can model a content-addressable

memory [4, 17].

(a) An example feed-forward networkwith four neurons

(b) An example Hopfield network withfour neurons

Figure 2.5 – A comparison of feed-forward and Hopfield network

Parallel Distributed Processing (PDP)

In 1986, David E. Rumelhart, James L. McClelland and the PDP Research Group wrote a

two-volume book named Parallel Distribute Processing in which they described the general

framework of PDP models. For them, the reason why the brain is outperforming the com-

puters on some tasks is that the brain architecture is more suitable for natural information

9


processing that humans perform. With other contemporary researchers, the authors of the

book decided to move towards PDP models, in which information processing is done through

the interconnected simple processing units. In their book, they describe the eight major

aspects of PDP model as in the following:

• A set of processing units

• A state of activation

• An output function for each unit

• A pattern of connectivity among units

• A propagation rule for propagating patterns of activities through the network of connec-

tivities

• An activation rule for combining the inputs impinging on a unit with the current state

of that unit to produce a new level of activation for the unit

• A learning rule whereby patterns of connectivity are modified by experience

• An environment within which the system must operate [37]

The processing units can be of type input, output or hidden. The input units take signals

from the external sources and output units provides signals to external sources. The hidden

units receive and transmit signals that remain within the system. The state of activation

is the state of the system at a certain time. The activation value can be either discrete or

continuous. The range of continuous values might be unbounded or restricted. The output

function converts the state of activation to the output signal. The weight matrix may be used

to represent the pattern of connectivity. The positive weights correspond to excitatory inputs

and negative weights to inhibitory inputs. A zero weight means that units are not directly

connected. The value of the weight shows how strong the connection is. The propagation

rule is how the input of the units are determined by using the output of the previous unit and

the connections in-between. For instance, this can be the weighted sum of the inputs. The

activation rule F denotes a function which converts the existing state and the net input to

the new activation state of the unit. This function can for example be the identity function, a

threshold function or a sigmoid function. The learning rule changes the connections in a way

that new connections can be formed, existing connections can be deleted or the strengths of

the existing connections can be altered. The formers can be represented by the latter since

adding or deleting a connection can be represented by setting a zero or non-zero value as a

connection. The environment can be a probability function that is defined for all the inputs

and change in time.

10


Figure 2.6 – A multilayer network

Backpropagation

In another chapter of the same book, Rumelhart, McClelland and Geoffrey E. Hinton describes

how learning can take place in multilayer networks. Single-layer networks such as the per-

ceptron have limited capabilities because there are no hidden layers. With hidden layers,

internal representations are possible and problems such as the XOR-problem can be solved

using internal representations. The authors point out although there is convergence rules

for the perceptron and the delta rule by Widrow and Hoff, there is not such an effective rule

for multilayer networks. Hence, they propose a generalised delta rule which consists of two

parts. In the first part, the input of the network propagates forward by calculating the outputs

of the units by using nonlinear activation function. Then, the actual output of the output

layer is compared with the desired output. In the second part, this error signal is propagated

backwards with a gradient descent based algorithm involving the chain rule.

This chapter titled Learning Internal Representations by Error Propagation is a description of

backpropagation. It is worth noting that other researchers such as Paul Werbos [46], David

Parker [33] and Yann Le Cun [23] also proposed the backpropagation. Although there had

been debates about who invented this algorithm, backpropagation is one of the well-known

and widely-used neural network algorithms [4].

Neuromorphic Engineering as a Field

In California Institue of Technology (Caltech) in 1982, Mead, John Hopfield and Richard

Feynman taught a course named Physics of Computation. Inspired by this, Carver Mead

came forward with the idea of implementing neural computation using analog circuits and

11


published his book Analog VLSI and Neural Systems in 1989 [42]. In one of his articles, he

expresses that "[t]here is nothing that is done in the nervous system that we cannot emulate

with electronics if we understand the principles of neural information processing" and uses the

phrase neurmorphic systems to describe systems built on the organization principles of the

nervous system [28]. Mead’s book, the Physics of Computation course and the Telluride Neuro-

morphic Engineering Workshop, which started in 1994, all contributed to the establishment of

Neuromorphic Engineering as a field [22].

2.2 Classes of Learning Algorithms in Neural Networks

Learning algorithms or rules can be classified into three main categories: supervised, unsu-

pervised, reinforcement learning.

Supervised Learning

In supervised learning, the dataset used is labeled and hence, the desired output of the system

is known. A comparison between the actual output and the desired output can be done

and this information can be used in training. With the labeling provided, we have more

information about the training set and thus, we can learn in a better way [4]. This type of

learning is also referred to as learning with a teacher.

In the perceptron, a comparison is made between the desired output and the actual output.

In ADALINE / Widrow-Hoff Learning and backpropagation, the weights are altered in a way to

minimise the total error for the inputs presented to the network. These learning algorithms

are examples of supervised learning.

Unsupervised Learning

Different than the supervised learning, the input patterns are not labeled in unsupervised

learning. Similarities and structure in the input are captured by the algorithm resulting in the

data to be organised by the network. The weights are adjusted by the network in such a way

that the regularities in the data are captured. Limited information about the input make it

hard to use unsupervised learning algorithms [4].

Performing pattern recognition by shapes, the neocognitron is an example of unsupervised

learning.

Reinforcement Learning

In another class of learning, the reinforcement learning, an agent is present that is in interac-

tion with its environment. The agent should be aware of the environment and act to have an

effect on its environment. It should try different actions and prefer the better resulting ones. It

12

2.3. A Bio-Inspired Learning Algorithm: Spike-Timing-Dependent Plasticity

should both use its experience and discover new actions. The goal of learning is to maximise

the reward function [44].

2.3 A Bio-Inspired Learning Algorithm: Spike-Timing-Dependent

Plasticity

The underlying mechanism of synaptic plasticity or how the strengths of synapses can change,

and learning are studied by many researchers. The presence of long-lasting potentiation in

the brain and its means was studied by Bliss and Lømo on rabbits in 1973 [6]. Following the

Hebbian learning principles, further studies has been conducted regarding the relationship

of presynaptic and postsynaptic neurons. Markram [26] and other researchers studied the

effect of timing of the presynaptic and postsynaptic spikes. Bi and Poo in 1998 demonstrated

that the relative timing of presynaptic and postsynaptic spikes can have different effects on

synapses [5]. Their results are shown in Fig. 2.7. In the first quadrant, where the postsynaptic

spike is after the presynaptic spike, an increase in the excitatory postsynaptic current (EPSC)

is observed. In the third quadrant, the postsynaptic spike takes place before the presynaptic

spike. Bi and Poo concluded that if repetitive postsynaptic spikes occur within 20 ms after

presynaptic spikes, long-term potentiation (LTP) takes place. If repetitive presynaptic spikes

are observed within 20 ms after postsynaptic spikes, long-term depression (LTD) occurs. When

absolute value of time difference between the presynaptic and postsynaptic spikes are more

than 20 ms, than no synaptic plasticity is observed. As the presynaptic and postsynaptic spikes

are closer in terms of time window, the effect on EPSC is larger and this effect decreases with

an exponential decay.

of spikes in the presynaptic and postsynaptic neurons may be usedin neural networks to decipher information encoded in spiketiming (Hopfield, 1995; Mainen and Sejnowski, 1995; de Ruytervan Steveninck et al., 1997; Rieke et al., 1997) and to storeinformation relating to the temporal order of various synapticinputs received by a neuron during learning and memory (Ger-stner and Abbott, 1997; Mehta et al., 1997)

In these cultures we found that only weak synaptic connectionsare susceptible to synaptic potentiation by correlated spiking,with a “cutoff” amplitude of !500 pA. Larger EPSCs may rep-resent either higher average sizes of evoked synaptic currents atindividual synaptic contacts (boutons) made by the presynapticneuron or a larger number of boutons, or both. If higher ampli-tude represents increased efficacy of individual boutons, then theexistence of the cutoff amplitude for LTP induction may indicatethat the machinery underlying the expression of synaptic poten-tiation has been saturated. For example, the probability of pre-synaptic vesicular fusion or the expression of new postsynapticglutamate receptors may have reached the maximal level sustain-able by the cell. Because synaptic inputs that contribute to thepostsynaptic spiking fall into the “potentiation window” associ-ated with the spikes, spontaneous spiking activity in these cul-tures may have continuously potentiated these synapses to asaturated level, resulting in failure in the induction of synapticpotentiation in older cultures.

The cellular basis that gives rise to the critical window for theinduction of synaptic modifications remains to be determined.The involvement of NMDA receptors in both potentiation anddepression suggests that elevation of cytosolic Ca 2" is critical inthe induction process, similar to that for synapses in the CA1region of the hippocampus (Nicoll and Malenka, 1995). Action

potentials initiated during the critical time window after synapticactivation but before the dissociation of glutamate from theNMDA channel will lead to the opening of the channel (byremoving the Mg2" block) and a localized surge of cytoplasmicCa2" (Connor et al., 1994). This NMDA receptor-mediatedCa2" influx may also act cooperatively with Ca2" influx throughthe voltage-dependent Ca2" channels to induce synaptic poten-tiation (Eilers et al., 1995; Yuste and Denk, 1995; Magee andJohnston, 1997). The finding of a reduced extent of synapticpotentiation in the presence of L-type Ca2" channel blocker isconsistent with the latter findings. Although the off-rate of glu-tamate from the NMDA receptor is much longer than 20 msec,the requirement of multiple Ca 2" binding in the activation ofdownstream effector molecules (e.g., calmodulin) could poten-tially sharpen the time window of synaptic modification. Alterna-tively, the dendritically expressed transient A-type K" channelsthat can be inactivated by subthreshold EPSPs may also play arole by limiting the back-propagation of dendritic action poten-tials initiated outside the potentiation window (Hoffman et al.,1997). In the case of negatively correlated spiking, spike-inducedCa2" elevation attributable to opening of Ca2" channels beforesynaptic activation followed by a low-level Ca2" elevation attrib-utable to subthreshold synaptic activation may be responsible forthe induction of synaptic depression. Indeed, blocking L-typeCa2" channels abolished the induction of LTD (Fig. 8). Interest-ingly, binding of glutamate to NMDA receptors is also requiredfor the induction of LTD, although the membrane potentialremained at a relatively negative level after the spike. Takentogether, our results are consistent with the notion that spatial–temporal patterns of postsynaptic Ca2" elevation are critical forthe induction of synaptic changes (Lisman, 1989; Malenka et al.,1992; Neveu and Zucker, 1996). Finally, we noted that there wasa conspicuous absence of short-term potentiation or depression inthe present study. This can be accounted for by our use oflow-frequency stimulation, because short-term potentiation ordepression is known to result from changes in the presynaptictransmitter supply after high-frequency stimulation (Zucker etal., 1991).

The dependence of synaptic modifications on postsynaptic celltype has been observed in the Schaffer collateral (McMahon andKauer, 1997) and the mossy fiber pathways (Maccaferri et al.,1998) in hippocampal slices. In both studies, the standard proto-col of high-frequency stimulation that normally induces LTP atsynapses onto pyramidal cells either had no effect or resulted inpersistent depression of synapses onto interneurons. Our resultsshowed that not only the induction of LTP is target-cell specific;similar target specificity also exists for the induction of LTD. Thetarget specificity could result from differences in the postsynapticmolecular machinery underlying synaptic modifications. For ex-ample, both the ! isoform of calcium/calmodulin-dependent pro-tein kinase II (CaMK II !) and the Ca2"/calmodulin-dependentprotein phosphatase 2B (calcineurin) appear to be absent in thepostsynaptic densities of glutamatergic inputs onto GABAergicneurons in the cerebral cortex and hippocampus (Stevens et al.,1994; Liu and Jones, 1996, 1997; Sık et al., 1998). Interestingly, inparallel fiber synapses in the cerebellum-like electrosensory lobeof the mormyrid electric fish, where postsynaptic targets areGABAergic Purkinje-like cells, synaptic modifications can still beinduced. However, the dependence on the temporal order ofcorrelated presynaptic and postsynaptic spikes is opposite to thatreported here (Bell et al., 1997).

The general notion that correlated presynaptic and postsynap-

Figure 7. Critical window for the induction of synaptic potentiation anddepression. The percentage change in the EPSC amplitude at 20–30 minafter the repetitive correlated spiking (60 pulses at 1 Hz) was plottedagainst the spike timing. Spike timing was defined by the time interval (#t)between the onset of the EPSP and the peak of the postsynaptic actionpotential during each cycle of repetitive stimulation, as illustrated by thetraces above. For this analysis, we included only synapses with initialEPSC amplitude of $500 pA, and all EPSPs were subthreshold for dataassociated with negatively correlated spiking. Calibration: 50 mV, 10msec.

10470 J. Neurosci., December 15, 1998, 18(24):10464–10472 Bi and Poo • Spike Timing for LTP and LTD in Culture

Figure 2.7 – Long-term potentiation (LTP) and long-term depression (LTD) by Bi and Poo[5]

13


Other relationships of presynaptic and postsynaptic spikes exists in nature (2.8). The order

of presynaptic and postsynaptic spikes can change to induce LTP and LTD so that if the

presynaptic spike precedes the postsynaptic spike, LTD can occur while for LTP, the presynaptic

spike follows the postsynaptic spike. The lengths of the exponential decays for LTP and LTD

can be different. Also, symmetrical plasticity can be observed in some examples where only

the amount of difference of the presynaptic and postsynaptic spikes regardless of their order

have an effect on synapses. These different examples are reviewed by Abbott and Nelson in

2000 [1].

nature neuroscience supplement • volume 3 • november 2000 1179

review

es synaptic strengths, characterized by the amplitudes of minia-ture excitatory postsynaptic currents (mEPSCs), to increase in amultiplicative manner (Fig. 1). Conversely, enhancing activity byblocking inhibition scales down mEPSC amplitudes (Fig. 1).

Some biophysical mechanisms responsible for the bidirection-al and multiplicative properties of synaptic scaling are understood.Direct application of glutamate4 and fluorescent labeling of recep-tors5,6 show that synaptic scaling is due to a postsynaptic changein the number of functional glutamate receptors. Furthermore,increasing synaptic strength during reduced activity is associatedwith a decrease in the turnover rate of synaptic AMPA-type glu-tamate receptors6. If receptor insertion and removal rates are dif-ferentially scaled by activity, this can produce multiplicative changesin synaptic strength7.

Synaptic scaling in combination with LTP and LTD seems togenerate something similar to a synaptic modification rule analyzedby Oja8 that illustrates the power of stable, competitive Hebbianplasticity (see Math Box). The Oja rule combines Hebbian plastic-ity with a term that multiplicatively decreases the efficacy of allsynapses at a rate proportional to the square of the postsynaptic fir-ing rate. In simple neuron models, this generates an interestingform of input selectivity, related to a statistical method called prin-cipal component analysis, in which neurons become selective tothe linear combination of their inputs with the maximum variance.This is, in some sense, the most interesting and informative com-bination of inputs to which the neuron can become responsive.

Activity manipulations scale both AMPA- and NMDA-receptor-mediated forms of glutamatergic synaptic transmission9. Scalingof the NMDA receptor component has implications for Hebbianplasticity, because LTP and LTD are produced by calcium entrythrough NMDA receptors. The standard view is that large amountsof calcium entry induce LTP, whereas smaller amounts causeLTD10. If neurons scale down NMDA receptor currents in response

to enhanced activity, this may make it more difficult to evoke LTPand easier to induce LTD. Thus, in addition to multiplicativelyadjusting synaptic strengths, synaptic scaling may modify Heb-bian plasticity in a manner functionally similar to the BCM model’ssliding threshold.

Spike-timing dependent synaptic plasticitySynaptic scaling is a non-Hebbian form of plasticity because it actsacross many synapses and seems to depend primarily on the post-synaptic firing rate rather than on correlations between pre- andpostsynaptic activity. Purely Hebbian forms of plasticity can alsobe used to regulate total levels of synaptic drive, but this requires adelicate balance between LTP and LTD. The sensitivity of synap-tic plasticity to the timing of postsynaptic action potentials (STDP)can provide a mechanism for establishing and maintaining thisbalance.

It has long been known that presynaptic activity that precedespostsynaptic firing or depolarization can induce LTP, whereasreversing this temporal order causes LTD11–13. Recent experimen-tal results have expanded our knowledge of the effects of spike tim-ing on LTP and LTD induction14–21. Although the mechanismsthat make synaptic plasticity sensitive to spike timing are not fullyunderstood, STDP seems to depend on an interplay between thedynamics of NMDA receptor activation and the timing of actionpotentials backpropagating through the dendrites of the postsy-naptic neuron15,22,23.

The type and amount of long-term synaptic modificationinduced by repeated pairing of pre- and postsynaptic action poten-tials as a function of their relative timing varies in different prepa-rations (Fig. 2). In general, synaptic modification is maximal for

Fig. 1. Synaptic scaling ismultiplicative. Quantal ampli-tudes of miniature EPSCsrecorded from cortical pyra-midal neurons in cultures thatexperience normal levels ofspontaneous activity (controlamplitude) are rank orderedand plotted against ampli-tudes recorded in sister cul-tures in which activity waseither blocked with thesodium channel blockertetrototoxin (TTX) orenhanced by blocking inhibition with bicuculline (BIC) for two days.Activity blockade scales up mEPSC amplitude, whereas activity enhance-ment scales it down. The plots are well fit by straight lines, indicating thatin both cases the scaling is multiplicative. Adapted from ref. 4.

-150

-100

-50

0Am

plitu

de (p

A)-80-60-40-200

Control amplitude (pA)

TTX BIC

Fig. 2. The amount and type of synaptic modification (STDP) evoked byrepeated pairing of pre- and postsynaptic action potentials in differentpreparations. The horizontal axis is the difference tpre – tpost between thetimes of these spikes. The numerical labels on this axis are approximateand are only intended to give an idea of the general scale. Results areshown for slice recordings of neocortex layer 5 and layer 2/3 pyramidalneurons14,21 and layer 4 spiny stellate cells20, in vivo recordings of retino-tectal synapses in Xenopus tadpoles19, in vitro recordings of excitatory andinhibitory synapses from hippocampal neurons11–13,15,17,18 (Ganguly et al.,Soc. Neurosci. Abstr. 25, 291.6, 1999) and recordings from the electrosen-sory lobe (ELL), a cerebellum-like structure in mormyrid electric fish16.

neocortex-layer 5Xenopus tectumhippocampus

neocortex-layer 2/3hippocampus

GABA-ergic neuronsin hippocampal culture

neocortex-layer 4 spinystellates

t pre – t post (ms)0 50–50

ELL of electric fish

a

b

c

d

e

© 2000 Nature America Inc. • http://neurosci.nature.com

© 2

000

Nat

ure

Am

eric

a In

c. •

htt

p://n

euro

sci.n

atur

e.co

m

Figure 2.8 – Different examples of synaptic plasticity [1].

14

3 Memristors and Their Use in ArtificialNeural Networks

3.1 Definition

Leon O. Chua introduced in 1971 the category of memristor as the fourth two-terminal circuit

element besides resistor, capacitor and inductor[10]. These three well-known circuit elements

are represented through relationships of four circuit variables: current (i), voltage (v), charge

(q) and magnetic flux (ϕ). Chu explained that the correlation between q and ϕ is not covered

by the classical equations and proposed that the memristor can represent this relationship

(Fig. 3.1). The name memristor (short for memory resistor) stems from the fact that the

memristor acts like a "nonlinear resistor with memory". He presented the theoretical analysis

of memristors in his paper without linking it to any devices.

In 1976, Chua and Sung Mo Kang presented the generalization of the memristor concept

to a broader class of nonlinear systems [11]. In memristive systems, the state of the system

affects the output and serves as memory of the device. Memristive systems can be described

by Eq. 3.1, where y denotes the output, u denotes the input, x is the state of the system, x is

the derivative of x with respect to time, f and g are continuous functions. The input can be

replaced by i in current controlled memristor and by v in voltage controlled memristor.

x = f (x,u, t ) (3.1)

y = g (x,u, t )u

Properties of the memristive systems are mathematically analyzed throughout their paper.

Chua and Kang provided examples of memristive systems such as the termistor, the sodium

and potassium channels of the Hodgkin-Huxley model and the discharge tubes. They further

suggest that there are existing physical and biological models that should be re-examined to

be categorized under memristive systems.

In 2008, Strukov, Snider, Stewart and Stanley from HP Labs presented a physical model of a

15

Chapter 3. Memristors and Their Use in Artificial Neural Networks

Resistor

Chapter 3. Memristors

The authors provide examples to memristive systems such as the termistor, the sodium

and potassium channels of the Hodgkin-Huxley model and the discharge tubes and further

suggests that there are existing physical and biological models that should be re-examined to


The physical memristor has been found by the HP.

dv = Rdi

dq = Cdv

d' = Ldi

d' = Mdi

* Chua 1976 * Chua 2011 * HP 2008

There are debates about the

PCM

MRAM

ReRAM will be explained in detail in the next section.

3.2 Resistive Random Access Memory (ReRAM)

3.3 Use of ReRAMs in ANNs

14







dv = Rdi

dq = Cdv

d' = Ldi

d' = Mdi

* Chua 1976 * Chua 2011 * HP 2008


PCM

MRAM




14

Capacitor

Memristor Inductor







dv = Rdi

dq = Cdv

d' = Ldi

d' = Mdi

* Chua 1976 * Chua 2011 * HP 2008


PCM

MRAM




14







dv = Rdi

dq = Cdv

d' = Ldi

d' = Mdq

* Chua 1976 * Chua 2011 * HP 2008


PCM

MRAM




14







dv = Rdi

dq = Cdv

d' = Ldi

d' = Mdi

* Chua 1976 * Chua 2011 * HP 2008


PCM

MRAM




14







dv = Rdi

dq = Cdv

d' = Ldi

d' = Mdq

* Chua 1976 * Chua 2011 * HP 2008


PCM

MRAM




14







dv = Rdi

dq = Cdv

d' = Ldi

d' = Mdi

* Chua 1976 * Chua 2011 * HP 2008


PCM

MRAM




14







dv = Rdi

dq = Cdv

d' = Ldi

d' = Mdq

* Chua 1976 * Chua 2011 * HP 2008


PCM

MRAM




14

d'=

vdt

dq = idt

Figure 3.1 – Two-terminal circuit elements (adapted from [43]).

two-terminal device showing characteristics of a memristor [43]. The paper focuses on the

current-controlled memristor. They studied a thin semiconductor film with metal contacts

at both ends (Fig. 3.2a). The film has the thickness D and the state variable w denotes the

width of the doped region. The doped region has a lower resistance (denoted with Ron) and

the undoped region, with very low or zero dopant concentration, has a higher resistance

(denoted with Ro f f ). With voltage applied to the device, the dopants would move and alter

the boundary between the doped and undoped regions. The authors derived mathematically

the memristance of the device.

One class of devices that falls under the definition of a memristor by Strukov et. al. is the

Resistive Random Access Memory. These devices will be described in Chapter 3.2.

Strukov et. al. presented the i-v behaviour of the device, displayed in Fig 3.2b. They explained

that when a symmetrical alternative-current (AC) is applied to the device, a hysteresis curve

arises. This curve reduces to a line for high frequencies. If the input is asymmetrical, various

continuous states can be observed.

In 2011, Chua stated that disregarding their material and physical operating mechanism, all

two-terminal non-volatile memory devices that function with resistive switching should be

categorized as memristors [9]. He explained that a device with resisting switching is able to

keep a resistance value for a long time without power consumption. The write operation

to a resistance switching memory device can be performed by switching between the high-

resistance state and low-resistance state through a short pulse of current or voltage and the

16

3.2. Resistive Random Access Memory

range of the state variable w and as a memristive system for another,wider (but still bounded), range of w. This intuitive model producesrich hysteretic behaviour controlled by the intrinsic nonlinearity ofM and the boundary conditions on the state variable w. The resultsprovide a simplified explanation for reports of current–voltageanomalies, including switching and hysteretic conductance, multipleconductance states and apparent negative differential resistance,especially in thin-film, two-terminal nanoscale devices, that havebeen appearing in the literature for nearly 50 years2–4.

Electrical switching in thin-film devices has recently attractedrenewed attention, because such a technology may enable functionalscaling of logic and memory circuits well beyond the limits of com-plementary metal–oxide–semiconductors24,25. The microscopicnature of resistance switching and charge transport in such devicesis still under debate, but one proposal is that the hysteresisrequires some sort of atomic rearrangement that modulates theelectronic current. On the basis of this proposition, we consider athin semiconductor film of thickness D sandwiched between twometal contacts, as shown in Fig. 2a. The total resistance of thedevice is determined by two variable resistors connected in series(Fig. 2a), where the resistances are given for the full length D ofthe device. Specifically, the semiconductor film has a region with ahigh concentration of dopants (in this example assumed to be pos-itive ions) having low resistance RON, and the remainder has a low(essentially zero) dopant concentration and much higher resistanceROFF.

The application of an external bias v(t) across the device will movethe boundary between the two regions by causing the chargeddopants to drift26. For the simplest case of ohmic electronic conduc-tion and linear ionic drift in a uniform field with average ion mobility

mV, we obtain

v(t)~ RONw(t)

DzROFF 1

w(t)

D

! "! "i(t) ð5Þ

dw(t)

dt~mV

RON

Di(t) ð6Þ

which yields the following formula for w(t):

w(t)~mV

RON

Dq(t) ð7Þ

By inserting equation (7) into equation (5) we obtain the memri-stance of this system, which for RON=ROFF simplifies to:

M(q)~ROFF 1mVRON

D2q(t)

! "

The q-dependent term in parentheses on the right-hand side of thisequation is the crucial contribution to the memristance, and itbecomes larger in absolute value for higher dopant mobilities mV

and smaller semiconductor film thicknesses D. For any material, thisterm is 1,000,000 times larger in absolute value at the nanometre scalethan it is at the micrometre scale, because of the factor of 1/D2, andthe memristance is correspondingly more significant. Thus, memri-stance becomes more important for understanding the electroniccharacteristics of any device as the critical dimensions shrink to thenanometre scale.

The coupled equations of motion for the charged dopants and theelectrons in this system take the normal form for a current-controlled(or charge-controlled) memristor (equations (1) and (2)). Thefact that the magnetic field does not play an explicit role in the

b a c

OFF

ON

Undoped:

Doped:

ONwID OFFwID

D

Undoped

w

Doped

A

V

10w0

w0–10

–5

0

5

10

–1.0 –0.5 0.0 0.5 1.0Voltage

1.0

0.5

0.0

w/D

0.60.50.40.30.20.10.0Time (×103)

–1.0–0.50.00.51.0

Volta

ge

–10–50510

0.60.40.20.0

Cha

rge

500Flux

1.0

0.5

0.0

0.60.50.40.30.20.10.0

–1.0–0.50.00.51.0

–8

–4

0

4

8

Current (×10

–3)

–8

–4

0

4

8

–1.0 –0.5 0.0 0.5 1.0Voltage

0.60.40.20.0

Cha

rge

1000Flux

1 2 3 4 5 6

3

12

56

4

Time (×103)

Current (×10

–3)

w/D

Volta

ge

Cur

rent

(×10

–3)

Cur

rent

(×10

–3)

Figure 2 | The coupled variable-resistor model for a memristor. a, Diagramwith a simplified equivalent circuit. V, voltmeter; A, ammeter. b, c, Theapplied voltage (blue) and resulting current (green) as a function of time t fora typical memristor. In b the applied voltage is v0sin(v0t) and the resistanceratio is ROFF=RON~160, and in c the applied voltage is 6v0sin2(v0t) andROFF=RON~380, where v0 is the magnitude of the applied voltage and v0 isthe frequency. The numbers 1–6 label successive waves in the applied voltageand the corresponding loops in the i–v curves. In each plot the axes aredimensionless, with voltage, current, time, flux and charge expressed in unitsof v0 5 1 V, i0:v0=RON~10 mA, t0 ; 2p/v0 ; D2/mVv0 5 10 ms, v0t0 and

i0t0, respectively. Here i0 denotes the maximum possible current through thedevice, and t0 is the shortest time required for linear drift of dopants acrossthe full device length in a uniform field v0/D, for example with D 5 10 nmand mV 5 10210 cm2 s21 V21. We note that, for the parameters chosen, theapplied bias never forces either of the two resistive regions to collapse; forexample, w/D does not approach zero or one (shown with dashed lines in themiddle plots in b and c). Also, the dashed i–v plot in b demonstrates thehysteresis collapse observed with a tenfold increase in sweep frequency. Theinsets in the i–v plots in b and c show that for these examples the charge is asingle-valued function of the flux, as it must be in a memristor.

NATURE | Vol 453 | 1 May 2008 LETTERS

81Nature Publishing Group©2008

(a) Model for a memris-tor.

range of the state variable w and as a memristive system for another,wider (but still bounded), range of w. This intuitive model producesrich hysteretic behaviour controlled by the intrinsic nonlinearity ofM and the boundary conditions on the state variable w. The resultsprovide a simplified explanation for reports of current–voltageanomalies, including switching and hysteretic conductance, multipleconductance states and apparent negative differential resistance,especially in thin-film, two-terminal nanoscale devices, that havebeen appearing in the literature for nearly 50 years2–4.

Electrical switching in thin-film devices has recently attractedrenewed attention, because such a technology may enable functionalscaling of logic and memory circuits well beyond the limits of com-plementary metal–oxide–semiconductors24,25. The microscopicnature of resistance switching and charge transport in such devicesis still under debate, but one proposal is that the hysteresisrequires some sort of atomic rearrangement that modulates theelectronic current. On the basis of this proposition, we consider athin semiconductor film of thickness D sandwiched between twometal contacts, as shown in Fig. 2a. The total resistance of thedevice is determined by two variable resistors connected in series(Fig. 2a), where the resistances are given for the full length D ofthe device. Specifically, the semiconductor film has a region with ahigh concentration of dopants (in this example assumed to be pos-itive ions) having low resistance RON, and the remainder has a low(essentially zero) dopant concentration and much higher resistanceROFF.

The application of an external bias v(t) across the device will movethe boundary between the two regions by causing the chargeddopants to drift26. For the simplest case of ohmic electronic conduc-tion and linear ionic drift in a uniform field with average ion mobility

mV, we obtain

v(t)~ RONw(t)

DzROFF 1

w(t)

D

! "! "i(t) ð5Þ

dw(t)

dt~mV

RON

Di(t) ð6Þ

which yields the following formula for w(t):

w(t)~mV

RON

Dq(t) ð7Þ

By inserting equation (7) into equation (5) we obtain the memri-stance of this system, which for RON=ROFF simplifies to:

M(q)~ROFF 1mVRON

D2q(t)

! "

The q-dependent term in parentheses on the right-hand side of thisequation is the crucial contribution to the memristance, and itbecomes larger in absolute value for higher dopant mobilities mV

and smaller semiconductor film thicknesses D. For any material, thisterm is 1,000,000 times larger in absolute value at the nanometre scalethan it is at the micrometre scale, because of the factor of 1/D2, andthe memristance is correspondingly more significant. Thus, memri-stance becomes more important for understanding the electroniccharacteristics of any device as the critical dimensions shrink to thenanometre scale.

The coupled equations of motion for the charged dopants and theelectrons in this system take the normal form for a current-controlled(or charge-controlled) memristor (equations (1) and (2)). Thefact that the magnetic field does not play an explicit role in the

b a c

OFF

ON

Undoped:

Doped:

ONwID OFFwID

D

Undoped

w

Doped

A

V

10w0

w0–10

–5

0

5

10

–1.0 –0.5 0.0 0.5 1.0Voltage

1.0

0.5

0.0

w/D

0.60.50.40.30.20.10.0Time (×103)

–1.0–0.50.00.51.0

Volta

ge

–10–50510

0.60.40.20.0

Cha

rge

500Flux

1.0

0.5

0.0

0.60.50.40.30.20.10.0

–1.0–0.50.00.51.0

–8

–4

0

4

8

Current (×10

–3)

–8

–4

0

4

8

–1.0 –0.5 0.0 0.5 1.0Voltage

0.60.40.20.0

Cha

rge

1000Flux

1 2 3 4 5 6

3

12

56

4

Time (×103)

Current (×10

–3)

w/D

Volta

ge

Cur

rent

(×10

–3)

Cur

rent

(×10

–3)

Figure 2 | The coupled variable-resistor model for a memristor. a, Diagramwith a simplified equivalent circuit. V, voltmeter; A, ammeter. b, c, Theapplied voltage (blue) and resulting current (green) as a function of time t fora typical memristor. In b the applied voltage is v0sin(v0t) and the resistanceratio is ROFF=RON~160, and in c the applied voltage is 6v0sin2(v0t) andROFF=RON~380, where v0 is the magnitude of the applied voltage and v0 isthe frequency. The numbers 1–6 label successive waves in the applied voltageand the corresponding loops in the i–v curves. In each plot the axes aredimensionless, with voltage, current, time, flux and charge expressed in unitsof v0 5 1 V, i0:v0=RON~10 mA, t0 ; 2p/v0 ; D2/mVv0 5 10 ms, v0t0 and

i0t0, respectively. Here i0 denotes the maximum possible current through thedevice, and t0 is the shortest time required for linear drift of dopants acrossthe full device length in a uniform field v0/D, for example with D 5 10 nmand mV 5 10210 cm2 s21 V21. We note that, for the parameters chosen, theapplied bias never forces either of the two resistive regions to collapse; forexample, w/D does not approach zero or one (shown with dashed lines in themiddle plots in b and c). Also, the dashed i–v plot in b demonstrates thehysteresis collapse observed with a tenfold increase in sweep frequency. Theinsets in the i–v plots in b and c show that for these examples the charge is asingle-valued function of the flux, as it must be in a memristor.

NATURE | Vol 453 | 1 May 2008 LETTERS

81Nature Publishing Group©2008

(b) The hysteresis loops for the memristor: the applied voltage vs. theresulting current.

Figure 3.2 – Memristor model by HP Labs [43].

read operation can be done through a pulse with a lower amplitude. Furthermore, he added

that the i-v curve of the memristor is a pinched hysteresis loop. He concluded his paper with

this following remark: "If it's pinched, it's a memristor".

3.2 Resistive Random Access Memory

Resistive Random Access Memory (ReRAM) is a two-terminal device that consists of metal

top and bottom electrodes with an oxide in-between. With voltage applied to the terminals,

conductive filaments are formed in the oxide that result in change of the cell resistance (Fig.

3.3). The conductive filaments result from oxygen vacancy paths. As the filaments are formed,

the cell shifts from High Resistance State (HRS) to Low Resistance State (LRS). The cell can

switch back to HRS by destroying the previously formed conductive filaments.

The ReRAM devices can be divided in two categories according to their switching behaviour

and so their reset polarity: unipolar and bipolar switching. In unipolar switching, the ampli-

tude of the voltage applied determines the direction of switching. Examples of devices that ex-

hibit this type of behaviour are Pt/HfO2/Pt and Pt/TiO2/Pt. The direction of switching depends

on the polarity of the applied voltage in bipolar switching and Pt/TaOx /Pt, TiN/HfOx /TiN and

Ru/TiO2/Ta2O5/Ru are examples of devices that show this behaviour [3, 40].

The I-V curve of an 10µm TiN/Ta/TaO2/TiN ReRAM is presented in Fig. 3.4. In a binary metal

oxide device, as the voltage increases from 0 V to a threshold voltage, known as the forming

voltage, the filament current increases and the device switches to LRS. The forming voltage

of the device decreases as thickness decreases [3] and multiple filaments can be formed in

17


Top Electrode

Bottom Electrode

OxideOxygen

Vacancies

Figure 3.3 – A simple Resistive RAM cell in HRS (left) and LRS (right) states.

the ReRAM cell. Afterwards, the RESET and SET operations can be performed and these can

be done repeatedly. The voltage where a transition from HRS to LRS occurrs is the set voltage

(Vset ) and the voltage that starts the transformation from LRS to HRS is named as the reset

voltage (Vr eset ) [25]. In Fig. 3.4, the forming voltage is given as 2 V, the set voltage is −1.5 V

and the RESET voltage is 1.8 V. The resistance in LRS in 8 kΩ and the maximum HRS is 160 kΩ,

making the maximum HRS/LRS ratio 20.

Material characterization of the deposited TaOx layer has beencarried out with Energy Dispersive X-ray (EDX) analysis using aFEI Tecnai Osiris TEM. Full wafer 20 nm TaOx and TiN layer weresputtered under the same conditions than for the ReRAM cell.Fig. 2(a) and (b) shows the EDX result and the depth profiles ofthe layers, respectively. Quantification of Ta and O gives 65.1% ofO and 34.9% of Ta or TaO2–0.14.

Electrical characterizations were carried out with an AgilentB1500 semiconductor device analyzer. Fig. 3 shows the I–V curveof a 10 lm cross-point ReRAM cell. A DC voltage is appliedbetween the top and the bottom electrode, sweeping between!2 V and 2 V. The ReRAM forming operation occurs during the firstsweep from 0 V to +2 V. Then, a sweep from 0 V to !2 V allows theRESET of the memory, while a DC sweep in the opposite voltagepolarity allows the SET operation. The ReRAM cell shows low-voltage forming behavior (Vforming = 2 V), therefore a high voltageforming step was not needed. No current compliance was appliedto the cell. The observed resistances are 8 kX for the Low ResistanceState (LRS) and 160 kX for the maximum High Resistance State(HRS), with a maximum HRS/LRS ratio of 20. The reset voltage is1.8 V and the SET voltage is !1.5 V.

The measured I–V curve shows a linear current behavior in theLRS, exhibiting an ohmic conduction mechanism, and an exponen-tial I–V relation in the HRS. The HRS conduction mechanism is

classically associated to tunneling phenomena, i.e., trap-assistedtunneling, Poole–Frenkel emission, Fowler–Nordheim tunnelingand direct tunneling [9]. The measurement results are comparedwith the simulations obtained from the ReRAM Verilog-A compactmodel released by Stanford University [10]. This model interpretsthe various conduction mechanisms previously mentioned defin-ing the current as an exponential function of the voltage and thegap according to the following equations:

Iðg;VÞ ¼ I0e!g

g0 sinhV

V0

dgdt¼ m0e!

EakT sinh

qacVtoxkT

where g is the gap between the filament end and the electrode, m0 isvelocity containing the attempt to escape frequency, Ea is theactivation energy, a is the hopping site distance, c an enhancementfactor (related to the polarizability of the material and to the non-uniform potential distribution) and tox is the switching materialthickness.

The simulation result, obtained adapting the model’s fittingparameters (tox = 20.55 nm, g0 = 350 pm, V0 = 1.2 V, m0 = 19 nm/ns,I0 = 258 lA, c0, the first value for c = 15.85, g0max, the maximumvalue for g0 = 1.7 nm), fits well the measurements, corroboratingthe hypothesis that the HRS conduction mechanism is based ontunneling-based phenomena.

3. Post-processing technique

An integration method for passive memory arrays on top ofCMOS chip front-end is discussed in this section. Integration isdone by post-processing a fully processed CMOS die. Chip-levelpost-processing is attractive because it is cheaper compared tostandard wafer-based solutions [11]. From a circuit perspective,the CMOS die embeds the read/write circuitry used to control thememory array.

We post-process the chips fabricated in 180 nm CMOS technol-ogy. A micrograph of the post-processed test chip is shown inFig. 4. The chip includes an 8 % 8 crossbar array, single memorycross-points and various test structures. A TaOx thin film is depos-ited between the CMOS chip Metal 5 (M5) and an intermediatemetal layer between M5 and M6, which is used in the targetCMOS technology to fabricate Metal–Insulator–Metal (MIM) capaci-tors. Fig. 5 shows a scanning electron micrograph of the MIMcapacitor, which is electrically connected through VIAs to M6.The M5, M6 and the MIM layers are made of Al, with an outer

Fig. 1. (a) Schematic drawing of the TiN/Ta/TaOx/TiN ReRAM devices. (b)Micrograph of the fabricated ReRAM cells.

(a)

(b)Fig. 2. EDX analysis showing (a) Ti and Ta atoms and (b) depth profile.

Fig. 3. I–V curve of the fabricated 10 lm TiN/Ta/TaOx/TiN-based ReRAM and thesimulated curve from the model.

J. Sandrini et al. / Microelectronic Engineering 145 (2015) 62–65 63

Figure 3.4 – I-V curve of 10µm TiN/Ta/TaO2/TiN-based ReRAM [39].

An advantage of ReRAM is that its fabrication is fully CMOS-compatible and Back-End-of-the-

Line (BEoL) integration is possible [39]. Moreover, ReRAM devices have a very good scalability

and programming time [38]. Another feature that makes ReRAM interesting especially for

neuromorphic applications is that it can exhibit intermediate states that allows it to store

multiple values. The resistance states of a Pt/TaOx/CrOy/Cr-based device is provided as an

example in Fig. 3.5. The four levels of resistance can be used to store 2 bits of information.

18

3.3. Memristors in Artificial Neural Networks

28 IEEE CIRCUITS AND SYSTEMS MAGAZINE SECOND QUARTER 2013

1) High Density Multi-Valued CrossbarsSeveral resistance levels of Pt/TaO /CrO /Cr/Cux y devices can be programmed. As shown in Fig. 10(a), four levels of resistance (encoding 2 bits) are found within a 4 orders of magnitude range. A larger resistance window of 1 bit is found for Al/TiO /Al2 devices, which show an LRS around 30X and a High Resistance State (HRS) at 1 M X within 2 orders of magnitude variation (see Fig. 10(b)). The Al/TiO /Al2 show stable LRS and HRS in a large VREAD voltage range (Fig. 11(a)). The Pt/TaO /CrO /Cr/Cux y devices dem-onstrate excellent scalability, as the HRS/LRS ratio improves for smaller device sizes (Fig. 11(b)). For instance, 2 bit can be written in a Pt/TaO /CrO /Cr/Cux y by using shorter SET pulses in order to program the cell in one of the stable Intermediate Resistance (IR) states. An example of 2 bit storage using LRS, HRS and 2 IRs is demonstrated in Fig. 12, each level is separated of about one order of magnitude from each other for various V .READ The devices could be easily assembled into dense . bit/cm2 5 109 2# passive crossbar arrays whose storage density improves to bit/cm10 01 2 thanks to ML capability of Pt/TaO /CrO /Crx y ReRAMs.

E. ConclusionsBipolar Resistive Switching Pt/TaO /CrO /Cr/Cux y and Al/TiO /Al2 devices built with thermal budget C200 °1have been fabricated and characterized. Very large storage density of TaO /CrOx y-based ReRAMs is dem-onstrated up to bit/cm1010 2 thanks to the excellent scalability of the fence-like top electrode lines.

III. Generic Memory Structure (GMS) for Non-Volatile FPGAs

A. IntroductionWhile a lot of research effort targets high den-sity ReRAM-based standalone memories [20], the focus of this work is the usage of ReRAMs for Field-Programmable Gate Arrays (FPGAs). The reason behind this choice is that in reconfigurable logic, up to

%40 of the area is dedicated to the storage of configura-tion signals [21]. Traditionally, the configuration data is serially loaded in SRAM cells, distributed through-out the circuit [22]. As a consequence, power up of the circuit is limited by the slow serial configuration.

Ra (nm)

-100 0 100 -100 0 100

Ra (nm)

(a) (b)

Dep

th H

isto

gram

(%

)

0

0.1

0.2

0.3

0.4

0.5

0.6

Dep

th H

isto

gram

(%

)

0

0.1

0.2

0.3

0.4

0.5

0.6

Figure 9. Average roughness profiles for TaOx cross-points: (a) . nm42 6.v after fabrication; (b) . nm56 3.v after 100 cycles. The broadening is attributed to the structural change induced by the motion of oxygen-vacancies upon cycling.

100

80

60

40

20

0101 102 103

Resistance (X)

(a)

Cum

ulat

ive

Pro

babi

lity

(%)

104 105

HRS

LRS

IR2IR1

100

80

60

40

20

0101 102 103 104 105 106 107

Resistance (X)

(b)

Cum

ulat

ive

Pro

babi

lity

(%)

HRSLRS

Figure 10. (a) Cumulative probability of Low Resistance State (LRS), Intermediate Resistance state 1 (IR1) and Intermediate Resistance state 2 (IR2) and High Resistance State (HRS) for Pt/TaO /CrO /Crx y devices are shown. The LRS, IR1 and IR2 are obtained by using SET pulses of 2 ms, 1 ms and s500 n at 1 V, respectively. The HRS is obtained with a s500 n RESET pulse at -1 V. (b) Cumulative probability of LRS and HRS for Al/TiO /Al2 devices are shown. The LRS and the HRS are obtained by using SET and RESET pulses of s500 n at -1 V and +1 V, respectively.

Figure 3.5 – Resistive levels of a Pt/TaOx/CrOy/Cr-based ReRAM [38].

3.3 Memristors in Artificial Neural Networks

There are many studies that use ReRAM as the synaptic weights element in Artificial Neural

Networks. The number of weights in a neural network is very high and the hardware of a

neural network requires to store a large number of synaptic weight elements. This can be

realized by building high-density crossbar arrays of ReRAM devices. If learning or adjusting

the weights would be performed on the hardware, then storing the weight analogously would

save chip area. The ReRAMs can address this problem by having intermediate states to store

multiple bits of information and being non-volatile. Furthermore, the fault tolerance of the

neural networks makes the use of unpredictable memristor elements attractive [2]. Thus,

utilization of memristors in Artificial Neural Networks is popular.

An Approach to Neural Networks using Memristor Crossbar Arrays

The paper Simulation of a Memristor-Based Spiking Neural Network Immune to Device Varia-

tions by Querlioz, Bichler and Gamrat present a neural network implemented with memristors

arranged in a crossbar and learn with a modified STDP algorithm [34].

The network is trained and tested using MNIST database of handwritten digits [24]. The

database consists of 28 x 28 pixels with each pixel has a greyscale value ranging from 0 to 255.

There are 60,000 training images in the database to use during the learning. The whole dataset

or a part of this dataset can be used. Same images can be re-shown to the network to continue

the training after the presentation of the dataset or the subset. The number of times the set of

images are shown to the network is called an epoch. After the learning, 10,000 test images are

applied to the network to check how the learning generalizes to unseen examples.

In their paper, authors are using the unsupervised learning methodology. They have 784 input

neurons where each input neuron is corresponding to a pixel in the MNIST set. The number

19


of output neurons can take values of 10, 50, 100 and 300 in this study. The crossbar array

is implemented in a way that all the input neurons are connected to each and every output

neuron and memristors are at the crossbar of each of these connections (Fig. 3.6a).

variability issue of nanodevices is original and takes inspiration from recent works in computational neuroscience and neural networks [24]-[27]. Different works have been published that go into that direction. As mentioned above, several proposals exist to use memristive devices for STDP [8],[9],[15]. In this paper we propose a simplified STDP scheme that we believe to be easier to implement, and to allow learning in a more natural way. Additionally, we propose a homeostasic property to the neurons that has not been used in this context before and that is shown to be essential for the robustness of the scheme. One work had already shown that memristors with STDP could allow the emergence of receptive fields in a variability-compatible unsupervised approach and synchronous neurons [5]. Our work uses asynchronous designs (like the ones used in the neuromorphic community [28],[29]), and performs full learning on a standard dataset. Finally, an alternative way to allow learning with memristors in a variability-compatible way can be to use all digital designs [10]. This requires more devices per synapses [3]. In this paper we show that variation-tolerance can be retained by using nanodevices with continuous variation of the conductance.

II. PRESENTATION OF THE NETWORK AND OF ITS IMPLEMENTATION

A. Architecture

In this paper we first propose a simple two layer topology.

Input and output CMOS neurons are connected in a feed-forward manner by nanodevices (the synapses), using a crossbar configuration (described below). The synapses learn using a simplified STDP scheme (II.B.1). The output neurons behave as leaky integrate-and-fire neurons (II.B.2) and have a homeostatic property (II.B.4). They are also connected by inhibitory connections using diffuser networks (II.B.3)

The input neurons present the stimuli as asynchronous voltage spikes using several possible coding schemes

described in section III (spiking rate is proportional to stimulus intensity). These stimuli may originate for example directly from a spiking retina [30] or cochlea [31] designed in the neuromorphic community. It is natural to lay out the nanodevices in the widely studied crossbar as illustrated on Fig. 1 (CMOS silicon neurons and their associated synaptic driving circuitry are the dots, the squares being the nanodevices). The synapses indeed act as adaptive resistors. With the crossbar layout, if several synapses are active at the same time (i.e. receiving spikes), the output receives directly the sum of the currents flowing through the synapses. In a more futuristic design, the system could also be laid out in a CMOL architecture where nanodevices crossbar is fabricated on top of the CMOS neurons and driving circuits [7].

As a result of learning, the output neurons will become selective to the different stimuli classes that are presented in a fully unsupervised manner: the output neurons will develop selectivity to specific features contained in the input patterns. The learning rule of the nanodevices needs to be fully local to be implementable in the crossbar architecture. The behavior of the neurons needs to be simple to make it easy to implement in a compact way. We now describe how this can be achieved.

B. Neurons and synapses

1) Synaptic behavior: In this scheme, the synapses are

acting in two ways. They are variable resistors and thus transmit spikes with a variable conductance (or weight). But

Fig. 2. Pulses for simplified STDP (voltage pulses as a function of time). When an input neuron spikes, it applies a PRE pulse to the nanodevices to which it is connected. When an output neuron spikes it applies a POST spike to the nanodevices to which it is connected. When the voltage applied on the device (difference between the voltages applied at the two ends) reaches VT+ or VT-, its conductance is increased or decreased, respectively.

Input layer784 neurons (28*28)

Output layer(integrate & fire neurons)

nanodevice

Fig. 1. Circuit topology. Wires originate from CMOS input layer (horizontal black wires) and from the CMOS output layer (vertical grey wires). Adaptive nanodevices are at the intersection of the horizontal and vertical wires.

1776

(a) Proposed circuit architecture.



A. Architecture












nanodevice


1776



A. Architecture












nanodevice


1776

(b) Proposed spike shapes for input and output neurons.

Figure 3.6 – Proposed circuit architecture and spike shapes by Querlioz, Bichler andGamrat [34].

In the paper, positive square pulses are sent to the input neurons according the to image

intensity of the pixel they are connected to in the MNIST set. The pulse has a fixed amplitude

but the frequency is adjusted according to the intensity of the pixel. The voltage provided to

the input neuron result in a current through the memristor element. This voltage however

is less than the necessary voltage to program the device. The currents coming from all the

memristor elements in a column then flows through the output neuron. The output neurons

are leaky Integrate-and-Fire (I&F) neurons where they act as integrators to the incoming

current. Inspired by the membrane potential of a biological neuron, each of the neurons of

this type has a threshold. When the membrane potential of the neuron exceeds the defined

threshold, the neuron sends a spike. This calculation of this potential is given in Eq. 3.2 where

20


τ and g are constants, Ii nput is the sum of the currents of a column and V is the state variable,

representing the potential. Following the winner-takes-all strategy, the potential of all neurons

are reset by the winner neuron sending inhibitory signals to the other neurons. The neurons

then enter a refractory period where they do not exhibit activity and their potential remains

at zero. The threshold of the neurons are adjusted with a homeostasis rule. The threshold of

a neuron which spike more than the average spike count of neurons is increased while the

threshold of a neuron exhibiting less activity that the average neurons is decreased. Thus, all

output neurons are expected to specialise in features and patterns.

τdV

d t+ gV = Ii nput (3.2)

The weights of the synapses, proportional to the conductance of the memristor, are not altered

if there is only a pulse from the input neurons. When there is a spike in the output neuron, it

sends a pulse with dual polarity to the column it is connected to. If there is an input pulse, the

negative part of the pulse from the input neurons subtracted from the input pulse results in an

increase in the weight. If there is no pulse from the input neurons for that synapse, then the

positive part of the output pulse results in a decrease in the weight. The amount of increase

and decrease are described by the authors as in Eq. 3.3 and 3.4 where w is the weight of the

synapse, wmax and wmi n denote the range of the allowed weight values and α+,α−, β+, β−are constants. This proposed learning can be observed in Fig. 3.6b.

δw+ =α+e−β+

w−wmi nwmax−wmi n (3.3)

δw− =α−e−β−

wmax−wwmax−wmi n (3.4)

The recognition rate is calculated by mapping the output neurons to the digit it responds to

the most.

The authors add that asynchronous spiking and the CMOS transistors operating in sub-

threshold region makes this algorithm low power. The variability of synapses and CMOS

neurons as well as the impact of different input coding styles are presented in their paper.

In another section of the paper, a supervised layer inserted after the first unsupervised layer

for labeling purposes is described. The second layer is similar to the first layer with certain

differences. The output neurons of the first layer serves as the input of the second layer and

sends the above-described dual polarity pulse to the second-layer crossbar. There are 10

output neurons of the second layer denoting the 10 digits that are presented to the network.

21


An output neuron sends a negative pulse signal if the neuron should be spiking. This signal,

combined with the positive part of the input pulse increases the weights. For the neurons that

shouldn’t spike and didn’t spike receive a positive square pulse and decrease the corresponding

weights when subtracted form the negative part of the dual polarity pulse. The neuron that

shouldn’t spike but did spike sends a negative square pulse of larger amplitude to have a

greater effect on the weights. The recognition rate in the supervised layer is very similar to the

unsupervised layer and does not increase the performance.

Another Approach to Neural Networks using Memristor Crossbar Arrays

Sheridan, Ma and Lu presented their memristor model and an algorithm to train a memristor-

based neural network using unsupervised training [41]. In the first part of their paper, they

showed that the model of a Pd/Wox /W device can represent this device’s experimental be-

haviour (Fig. 3.7a).

Pattern Recognition with Memristor Networks Patrick Sheridan, Wen Ma, Wei Lu*

Electrical Engineering and Computer Science University of Michigan, Ann Arbor, MI, U.S.A.

[email protected]

Abstract—In this paper we develop the concept of implementing pattern recognition algorithms in analog memristor networks. First, a device model is presented with experimental results demonstrating the feasibility of using WOx-based memristors to represent the tunable weights in a neural network. Next, simulation results demonstrate that an array of these memristors can be used to implement an unsupervised learning algorithm for pattern recognition. Handwritten digits are classified as an example problem while the concept is developed for more general use.

Keywords— resistive switching, pattern recognition, memristor, unsupervised learning

I. INTRODUCTION Pattern recognition is an important task in developing

autonomous, intelligent computers capable of assisting or replacing humans in dangerous or tedious tasks such as disease diagnosis, autonomous transportation, and sorting mail [1]. The emergence of nanoscale resistive switching devices (memristors) enables the implementation of pattern recognition algorithms directly in hardware with designs that are analogous to the visual cortex [2-4]. This allows dense storage of algorithm parameters and parallel execution, which can improve recognition performance as well as conserve space and energy. In this paper, we demonstrate the feasibility of using tungsten-oxide based resistive switches to implement an unsupervised learning algorithm for handwritten digit recognition tasks which can be generalized to solve many clustering problems.

II. RESISTIVE SWITCHING Reliable analog resistive switching has been achieved in

nanoscale Pd/WOx/W devices. Under an applied voltage bias oxygen vacancies within the WOx layer redistribute and affect the conductance of the cell. As described in [5], the device conductance can be modeled with the follow set of equations, which include a state variable, w.

sinh (1)

1 1 exp sinh (2)

The state variable represents the fraction of device area consumed by conductive vacancy filaments, and its dynamic growth is described by Eq. 1. Eq. 2 describes the current through the device, which can be divided into conduction through the filament region (modeled as tunneling current in this case as the 2nd term in Eq. 2) and conduction through the rest of the film (modeled as a Schottky diode).

Fig. 1. Experimental data fitted by the memristor model. (a) Voltage and current plotted against time under three consecutive +1.2V sweeps. (b) Data from (a) replotted to show current versus voltage. Inset in (b) show SEM of actual device.

As shown in Fig. 1, the model can accurately predict the memristor behavior. For example, by applying consecutive positive DC sweeps, the device conductance has been gradually enhanced. The device can be similarly erased in an analog fashion.

III. NETWORK DESCRIPTION The network is conceptually arranged in a crossbar fashion

with input units on the left hand side and outputs at the bottom as shown in Fig. 2(a). Each input neuron is connected to a row of the crossbar while an output neuron connects to a column. As will be discussed, the memristors connected to a given column form the receptive field of the output neuron at the bottom. The weights of these memristors, along with the

This work was supported in part by NSF through grant ECCS-0954621 and by DARPA through grant HR0011-13-2-0015.

978-1-4799-3432-4/14/$31.00 ©2014 IEEE 1078

(a) Experimental and modeling data.

Fig. 2. Network topology (a showing input neuronsoutput neurons (blue circles) and memristor crossbar arrsynaptic and post-synaptic pulses, and (c) their temporcrosspoint device’s resistance.

Fig. 3. Distribution of weights before (blue) and after (gr

s (green rectangles), ray (in grey). (b) Pre-ral overlap alters the

reen) training.

Fig. 4. (a) Sample of 20 receptive fields neurons). (b) Recognition rate as a function o

V. METH

The devices used for tuning thesputter depositing a layer of tunwafers. The bottom electrode was dlithography and tungsten etch. The partially oxidized in a rapid thermal the WOx switching layer. The top evaporation and liftoff. The device tested using LTSpice, while the written in Python.

VI. DISCUS

The algorithm works by forminprototypes for the input classes. Eacthe network, the neuron with the mfire, effectively declaring its belief certain class. The dual-polarity puneuron has the effect of moving treceptive field) closer to the input bbe most easily visualized by consinput as shown in Fig. 5. The protshown as solid red dots located on noted that the memristive weights renormalized after each training samthe L2norm of the weight vectors vtraining), and the input is depicted the input is normalized to unit lengtwell. The network has selectedhighlighted in yellow as the winnprototype is then moved in the direcby the dashed red circle in the figure

(from a network with 50 output of the number of output neurons.

HODS e model were fabricated by ngsten on Si/SiO2 carrier defined with electron-beam bottom electrode was then annealing furnace to create electrode was deposited by model was developed and network simulations were

SSION ng and continually refining ch time an input is shown to most similar prototype will that the input belongs to a

ulse emitted by the firing the neuron’s prototype (or

by a small amount. This can sidering a two-dimensional totypes of the network are the unit circle (it should be in the simulation were not mple, but it was found that aried by less than 2% after as a green arrow. Because

th it will lie on the circle as d the closest prototype, ner. The winning neuron’s ction of the input, indicated e. In the case of the MNIST

1080

(b) Crossbar arrays of memristor andproposed spike shapes for input and out-put neurons.

Figure 3.7 – Proposed device model and spike shapes by Sheridan, Ma and Lu [41].

Then, they propose a learning algorithm in a memristor-based crossbar array. According

to the normalized pixel intensity, the firing probability and the firing polarity of the input

neurons are determined. A new parameter, which is the normalized difference between the

intensity and average pixel intensity, is defined. If this new parameter is smaller than the

random number rolled, than the input neuron fires with a fixed voltage value. The sign of the

new parameter is used for the polarity of the firing voltage.

The output neurons are of type leaky Integrate-and-Fire (I&F). After integrating the currents

on the column, the neuron fires a spike with dual polarity when its threshold is exceeded (Fig.

3.7b). Using winner-takes-all strategy, the membrane potentials of all the neurons are then

reset. The homeostatic rule is also applied where the threshold of the output neurons are

adjusted so that the neurons can specialise in features of the input.

22


When the input and output spike overlap, the learning takes place. When subtracted from the

input spike, the first half of the output spike serves to increase the weights and the second

half decreases the weights. At the end of the learning, it has been observed by the authors

that the majority of the weights are close to 0 with the range of weights being between 0 and 1.

The labeling of the neurons is done after unsupervised learning through mapping the outputs

to labels by taking into account outputs of the selected neurons and their firing counts. A

comparison of the recognition rates of two papers is provided in Table 3.1.

Table 3.1 – Comparison of recognition rates of Querlioz, Bichler and Gamrat[34] andSheridan, Ma and Lu [41] with respect to number of output neurons.

Proposed algorithm 10 20 50 100 200 300

Querlioz, Bichler and Gamrat[34] 60% 81% 85% 93.5%Sheridan, Ma and Lu [41] 48% 56% 72% 79% 84%

Multilayer architecture

Another paper that use memristors as the synaptic weight elements is by Afifi, Ayatollahi and

Raissi [2]. In their paper, they described a learning algorithm to be used in memristor-based

crossbar arrays with CMOS neurons. These neurons are of type leaky Integrate-and-Fire (I&F).

As the other algorithms presented, the neuron continues to integrate the current until a certain

threshold is met. A spike is then sent, the neuron resets itself and enters to the refractory

period. The spikes sent by the neurons is a simplified version of the biological spike. As shown

in Fig. 3.8a (left), the proposed spikes consists of a linear function in third quadrant and and

exponential function in the first quadrant. Compared to the biological spikes of 3.8a (right),

the polarity of the spikes is different for algorithmic reasons explained in the paper. If the

pre-synaptic neuron spikes before the post-synaptic neuron, long-term potentiation (LTP)

takes place. If on the contrary, the post-synaptic neuron spikes before the pre-synaptic neuron,

the long-term depression (LTD) is observed.

An interesting feature of their work is the multilayer architecture they presented (Fig. 3.8b).

Several layers of neurons and synapses can be connected with this topology and various

spiking neural network algorithms can be implemented.

23


+τ

−τ

Figure 3. (a) Proposed spike shape used for processing and learning purposed, (b) Biological action potential.

The proposed spike has a shorter negative part than positive part. The negative part is used to obtain the voltage threshold needed to program the memristor and since this part is narrow it has no too much effect on dendritic voltage. Peak amplitudes of negative and positive parts are somewhat below memristor thresholds, to avoid possible conductance change of memristor by a single spike.

Similar to biological action potential, the proposed spike has two parts but with different polarities. The reasons are: (1) the long positive part when applied to an "On" state memristor synapse creates a current pulse which increases the dendritic voltage. (2) the memristor has a polarity that is defined to be axonic voltage minus dendritic voltage. During learning phase, the voltage drop across the memristors is equal to Vmem=APpre-BPAPpost. As shown in the following sections, the proposed shape can provide STDP. (3) CMOS implementation of the proposed spike shape is easier than biological-like action potential. The spike generation circuit has been proposed in [15]. The proposed spike can be approximated as,

⎪⎪⎩

⎪⎪⎨

⎧

<<−

<<+

= +++

−−−

otherwisettiftV

ttifVt

tAP ail

ail

,00,)/exp(

0,

)( ττ

(2)

Where −τ , t-ail and V- are negative parameters. Information of spikes timing can be easily recognized by peak detection at the transition time (here at 0 or generally at s). Information is not contained in the shape of the spikes, but there is in! the intervals between them. However, the spike shapes and their timings can influence our learning efficiency and in detail design it is required to characterize unwanted effects like RC elements through the routings.

B. Network architecture Using the same architectures of CMOL firing rate ANNs

(‘InBar’ and ‘FlossBar’ architectures [16]) we can implement SNNs with ability of online learning. A simple neuromorphic network with three layer neurons is shown in Fig. 4. We use CMOS based spiking neurons which works basically the same as conventional Integrate-and-Fire (I&F) neurons, but using proposed spike shape and specific spike back-propagation. The total current received by a neuron input depends on conductance, g, of connected synapses and the voltage drop across the synapses, as Ohm’s law. The synapses are characterized by a weight g.gmax , where gmax is the maximum

Figure 4. A simple three-layer neuromorphic network.

possible conductance (weight) and g ranges from 0 to 1. The sum of all input currents increase the dendritic voltage of a postsynaptic neuron until its integrator voltage (Vsoma) reaches a fixed threshold (Vthreshold). Then, the neuron fires and sends a spike forward to its axon and backward to its dendrite, simultaneously. Backward spike turns learning ON/OFF. During spiking of the neuron, spiking of any presynaptic neuron causes change the synaptic conductance of memristor located between them. It is clear that for non-plastic synapses in the network, such as fixed inhibitory synapses, one does not need to establish spike back-propagation.

C. Learning analysis Assume APpre and BPAPpost are presynaptic and back

propagated postsynaptic spikes, respectively. Also assume APpre occurs at time tpre= 0 and BPAPpost occurs at time tpost= s. So, t∆ = tpost- tpre= tpost= s. Following scenarios may occur depending on the time of arrival of the post- and presynaptic spikes, Fig. 5.

When 0<s<t+ail, i.e. pre-before-postsynaptic spiking has

happened and LTP would occur. In this case, in a partial period of learning window the voltage across the memristor is greater than its positive threshold. The more synchronized the post- and presynaptic spikes, the greater the voltage and the greater the LTP with a sinh rate.

When t-ail<s<0, i.e. post-before-presynaptic spiking has

happened and LTD would occur. In this case, in a partial period of learning window the voltage across the memristor decreases below its negative threshold.

If two spikes are not well synchronized there will be no overlap between APpre and BPAPpost and memristor conductance will not change, because the voltage across the memristor does not reach its threshold.

V. SIMULATION Using memristor model as (1) and proposed spike shape of

Fig. 3(a), we verify in this section the STDP learning properties of the memristor-based synapses which will result in discussed neural structure. The weight of each synapse depends on the conductance (g) of memristor located between two corresponding neurons.

565

(a) Proposed spikes for learning.

+τ

−τ

Figure 3. (a) Proposed spike shape used for processing and learning purposed, (b) Biological action potential.

The proposed spike has a shorter negative part than positive part. The negative part is used to obtain the voltage threshold needed to program the memristor and since this part is narrow it has no too much effect on dendritic voltage. Peak amplitudes of negative and positive parts are somewhat below memristor thresholds, to avoid possible conductance change of memristor by a single spike.

Similar to biological action potential, the proposed spike has two parts but with different polarities. The reasons are: (1) the long positive part when applied to an "On" state memristor synapse creates a current pulse which increases the dendritic voltage. (2) the memristor has a polarity that is defined to be axonic voltage minus dendritic voltage. During learning phase, the voltage drop across the memristors is equal to Vmem=APpre-BPAPpost. As shown in the following sections, the proposed shape can provide STDP. (3) CMOS implementation of the proposed spike shape is easier than biological-like action potential. The spike generation circuit has been proposed in [15]. The proposed spike can be approximated as,

⎪⎪⎩

⎪⎪⎨

⎧

<<−

<<+

= +++

−−−

otherwisettiftV

ttifVt

tAP ail

ail

,00,)/exp(

0,

)( ττ

(2)

Where −τ , t-ail and V- are negative parameters. Information of spikes timing can be easily recognized by peak detection at the transition time (here at 0 or generally at s). Information is not contained in the shape of the spikes, but there is in! the intervals between them. However, the spike shapes and their timings can influence our learning efficiency and in detail design it is required to characterize unwanted effects like RC elements through the routings.

B. Network architecture Using the same architectures of CMOL firing rate ANNs

(‘InBar’ and ‘FlossBar’ architectures [16]) we can implement SNNs with ability of online learning. A simple neuromorphic network with three layer neurons is shown in Fig. 4. We use CMOS based spiking neurons which works basically the same as conventional Integrate-and-Fire (I&F) neurons, but using proposed spike shape and specific spike back-propagation. The total current received by a neuron input depends on conductance, g, of connected synapses and the voltage drop across the synapses, as Ohm’s law. The synapses are characterized by a weight g.gmax , where gmax is the maximum

Figure 4. A simple three-layer neuromorphic network.

possible conductance (weight) and g ranges from 0 to 1. The sum of all input currents increase the dendritic voltage of a postsynaptic neuron until its integrator voltage (Vsoma) reaches a fixed threshold (Vthreshold). Then, the neuron fires and sends a spike forward to its axon and backward to its dendrite, simultaneously. Backward spike turns learning ON/OFF. During spiking of the neuron, spiking of any presynaptic neuron causes change the synaptic conductance of memristor located between them. It is clear that for non-plastic synapses in the network, such as fixed inhibitory synapses, one does not need to establish spike back-propagation.

C. Learning analysis Assume APpre and BPAPpost are presynaptic and back

propagated postsynaptic spikes, respectively. Also assume APpre occurs at time tpre= 0 and BPAPpost occurs at time tpost= s. So, t∆ = tpost- tpre= tpost= s. Following scenarios may occur depending on the time of arrival of the post- and presynaptic spikes, Fig. 5.

When 0<s<t+ail, i.e. pre-before-postsynaptic spiking has

happened and LTP would occur. In this case, in a partial period of learning window the voltage across the memristor is greater than its positive threshold. The more synchronized the post- and presynaptic spikes, the greater the voltage and the greater the LTP with a sinh rate.

When t-ail<s<0, i.e. post-before-presynaptic spiking has

happened and LTD would occur. In this case, in a partial period of learning window the voltage across the memristor decreases below its negative threshold.

If two spikes are not well synchronized there will be no overlap between APpre and BPAPpost and memristor conductance will not change, because the voltage across the memristor does not reach its threshold.

V. SIMULATION Using memristor model as (1) and proposed spike shape of

Fig. 3(a), we verify in this section the STDP learning properties of the memristor-based synapses which will result in discussed neural structure. The weight of each synapse depends on the conductance (g) of memristor located between two corresponding neurons.

565

(b) Proposed multilayer architecture.

Figure 3.8 – Proposed spike shapes and multilayer architecture by Afifi, Ayatollahi andRaissi [2].

24

4 Learning in a Hybrid CMOS-Memristive Neuromorphic System

4.1 Development of the Learning Algorithm

The learning algorithm developed is inspired by the work of Querlioz, Bichler and Gamrat [34]

(see Ch. 3.3). However, despite of all the similarities, there are some major differences in our

implementation.

The MNIST set of handwritten digits will be used for training and testing the network. This will

enable the comparison of the network performance with papers in literature. Each image pixel

will be connected to an input neuron. Thus, there will be 784 input neurons in all the different

network architectures that are implemented in this work. The number of output neurons will

differ between architectures and evaluation methods. The networks will be organized in a

crossbar array where all the inputs and outputs of a layer will be connected to each other. At

the intersection will be the ReRAM devices representing the synapses.

In this chapter, four network architectures and corresponding learning strategies will be

explained in detail. First, the single layer network will be presented by having 10, 50, 100 and

300 output neurons. Next, a labeling layer trained with supervised learning will be added

to the single layer layer network. There will be 10 output neurons after the labeling stage

corresponding to the 10 digits of the MNIST set. Third, a multilayer network, consisting of two

layers trained with unsupervised learning and a labeling layer will be introduced. Fourth and

last, adaptions to the algorithm for a better hardware-compatibility as well as the proposed

circuit topology will be demonstrated.

The code for this work is written using MATLAB for low design time. However, a MATLAB to

C/C++ interface has been developed and can be used in the future to improve the simulation

time by moving the simulations to C/C++. In this case, the inputs and directives can be

inserted by the user to MATLAB following by MATLAB calling C/C++ for the simulations. The

results are then passed to MATLAB for plotting and further analysis.

The learning algorithms developed in this work has a purpose of serving as the basis of a

25

Chapter 4. Learning in a Hybrid CMOS-Memristive Neuromorphic System

hardware implementation. Simplification of the algorithm, the parameters chosen and use

of probabilistic behaviour are done on purpose for hardware compatibility. STDP involving

spikes is compatible with ReRAM devices which can be programmed with pulses. The charac-

teristics of the ReRAM for this work are provided by the ReRAM group of the Microelectronic

Systems Laboratory of EPFL[39].

4.1.1 Single Layer Network

Initialization of the Network

Two very important parameters of the network are the weights and the thresholds of the output

neurons. These parameters will be auto-adjusted by the network throughout the learning

process and will be freezed during testing.

The range of the weights and the weight initialization will be set according to [34]. Thus, the

weights will be allowed to change between 0.0001 and 1. The mid-point of the range is close to

0.5. To reflect the unpredictability of the ReRAM array, the weights are initialized randomly to

values ranging from 0.49 to 0.51 (Fig. 4.1). Hence, the variability of the devices is assumed to

be within ±2% range.

5 10 15 20 25

510152025

5 10 15 20 25

510152025

5 10 15 20 25

510152025

5 10 15 20 25

510152025

5 10 15 20 25

510152025

5 10 15 20 25

510152025

5 10 15 20 25

510152025

5 10 15 20 25

510152025

5 10 15 20 25

510152025

5 10 15 20 25

510152025

0.49

0.495

0.5

0.505

0.51

Figure 4.1 – Initialization of weights in the network with 10 output neurons.

Although this method will be used throughout the work, a separate experiment will be con-

ducted to observe the effects of different weight initialization techniques on accuracy. Besides

the above-mentioned mid-centered initialization, other methods based on Gaussian, uniform

and binary distribution will be compared in Chapter 5.1.

26

4.1. Development of the Learning Algorithm

The thresholds of all output neurons are initialized to an experimentally found constant value.

The firing probabilities of all output neurons are equal in terms of their thresholds at the

beginning of learning. The randomness of the weights will begin the specialisation of neurons

in different features.

Loading Images to the Network

Each image from the MNIST set is shown to the network for 350 ms. These images are greyscale

and have the size 28 x 28. The background is dark and the pixels that carry information about

the digits are bright. These images are binarized by using thresholding. After this process, the

pixels representing the digits have the value of 1 while the background has the pixel value 0. In

an algorithm where the pixels provided to the network use different frequencies corresponding

to difference pixel intensities, thresholding reduces the algorithm complexity. Generating

only two different frequencies makes it easier to control and debug the algorithm. Also, it is a

simpler task in hardware.

(a) The images no. 1, 2, 3 from MNIST set of handwritten digits[23].

(b) The resulting images after thresholding.

Figure 4.2 – Input images provided to the network.

In this work, a pulse train with frequency of 20 Hz or a period of 50 ms is sent for the pixels

with value 1. For the pixels with value 0, a pulse with a very large period will be used. These

pixels won’t generate spikes during the presentation of the dataset (3 epochs of 60,000 training

images). Hence, learning the background pixels will be prevented. The pulse trains sent

to the pixels with values 1 contain square pulses with a period of 50 ms and a duty cycle of

50 %, making the pulse width 25 ms. Its amplitude is 0.75 V to match the real data of ReRAM

experiments.

27


Loading a new image is assumed to take one period of the square pulse, leaving the network

300 ms to repeatedly train or test using the image. Thus, 6 square pulses are sent to the network

per image and a maximum of 6 output neuron spikes are expected. These 50 ms intervals will

be referred to as a cycle.

Learning in the Network

At each time step, either a new image is shown to the network or learning is performed for an

image. For every 350 ms, a new image is shown in the first 50 ms and the network is in progress

of learning the current image for the remaining 300 ms.

According to the value of the thresholded pixel intensity, pulses are sent to input neurons at

each cycle during learning. This supplies voltage to some of the horizontal lines or rows in

the crossbar array. At each crossbar of an input neuron and an output neuron lies a ReRAM

device. One end is connected to an input neuron and the other end of the device is connected

to the output neuron. Because there are no spikes coming from the output neurons yet, 0 V is

at the other end of the devices. However, the memristors connected to spiking input neurons

will have a certain voltage on them. As a result, a current flows through these devices. This

current is inversely proportional with the resistance using Ohm’s law. So the resistance of the

ReRAM elements controls current through each device.

The resistance of the devices are inversely proportional to conductances and weights in the

network. The proportionality constant that converts a weight into a resistance is the constant

G (Eq. 4.1) and its value is obtained experimentally to reflect the range of resistance value of

the ReRAM devices correctly. The randomisation of the weights result in different currents

flowing through devices at the very beginning of learning process.

R = 1

wG(4.1)

In the vertical lines of columns of the crossbar, the currents are being summed according to

Kirchhoff’s current law and flow through the output neurons. The output neurons are of type

leaky Integrate-and-Fire (I&F). If these neurons are not in refractory period, they are allowed

to exhibit spiking behaviour. The state variables of each output neuron is then calculated

using Eq. 3.2.

When this differential equation is solved for V , Eq. 4.2 is obtained where Ii nput is the sum of

the current flowing through a column. C represents the constant of integration.

V = Ii nput

g+Ce

−g tτ (4.2)

28


To find the integration constant C , the initial condition is taken as the state variable being 0 at

time t = 0. Using this information, V is found by Eq. 4.3.

V = Ii nput

g− Ii nput

ge

−g tτ (4.3)

The time constant in this formula is adjusted according to the relationship between the current

and voltage of the ReRAM device being developed.

An output neuron will spike if its state variable exceeds its threshold. If there are multiple

neurons that exceed their threshold, then the one which has a larger difference between its

threshold and its state variable will be allowed to spike. When an output neuron spikes, it sends

a pulse with dual polarity. The learning takes place only at the synapses where the difference

between the input and the output of the ReRAM is higher the positive or the negative threshold

and change its resistance (Fig. 4.3). The first part of this dual polarity pulse is negative. Because

the voltage difference on the ReRAM is found by subtracting the input voltage by the output

voltage, this negative part will serve to enhance the input voltage if there is any and to increase

the weight stored by δw+ of Eq. 3.3. Following the negative part, a pulse with positive polarity

is fired. This part does not have an effect on the weight if there is an input voltage applied to

the device. Otherwise, it will decrease the weight by δw− of Eq. 3.4. Approximated from Fig.

3.6b, the spike time of each of the part is set to 5 ms. The negative part has the amplitude of

−0.75 V while the positive part’s amplitude is 1 V. These values are compatible with the ReRAM

being developed.

After the spiking of an output neuron, the state variables of all output neurons are forced to be

0 by the winner-take-all strategy. A refractory period of 10 ms starts where the state variables

are kept at 0, disabling any integration of current.

At the end of each cycle, the number of spikes that the spiking neuron has so far generated is

compared with the average number of spikes of the output neurons. If this neuron has spiked

more than two times the average, its threshold is increased by a fixed amount. This process

enables for all output neurons to spike and specialise in features.

The learning is repeated until all the examples are shown to the network.

Accuracy Calculation

At the end of each example, there is an output neuron which the network favours as an answer.

This information is stored during training or testing in two different ways. The first matrix

stores how many times the output neurons spike throughout learning. Because the correct

label is known for the accuracy calculation, the number of times a neuron spikes to which

correct label is stored. The sum of this matrix should give the number of examples shown

29


. . .

. . .

(a) No learning occurs when only input neurons spike.

. . .

. . .

(b) Learning occurs with increase of weights when bothinput and output neurons spike (green devices).

. . .

. . .

(c) Learning occurs with decrease of weights when bothinput and output neurons spike (cyan devices).

Figure 4.3 – Learning in the crossbar array.

30


multiplied by the number of cycles in each example. The second matrix holds the information

of which output neuron is voted by the network at every cycle of each example. Different than

the first matrix, the cases where there is no spikes are taken into account. If there is a spike in

the output neuron, the spiking neuron is the answer of the network. Otherwise, the neuron

which the network favours by having the largest state variable is taken as the answer. The

maximum number of spikes possible in this system is once in each cycle making a total of 6

times for each example.

For the accuracy calculation, at every 1000 examples, a mapping is created from the output

neurons to the digits which they spike the most frequently for using the first matrix. Then,

using this mapping, each spike that an output neuron made is judged as correct or incorrect.

This information is taken from the second matrix.

At the beginning of every new epoch, the counting matrixes are emptied so that the accuracy

calculation restarts for every epoch.

It is worth noting that using the labels for accuracy calculation does not make the training

supervised. The learning is done completely without the labels.

Testing

After training is finished, the testing of the network takes place. The learned weights and

updated thresholds for output neurons are freezed and unseen examples are shown to the

network. For accuracy calculations, the final mapping from the output neurons to digits

formed during training is freezed during testing.

10,000 test images of MNIST set are used for testing. It is expected that after seeing the images

of digits many consecutive times, the network has learned the necessary features of digits that

would enable it to recognize new inputs.

4.1.2 Single Layer Network with Labeling

Adding a labeling layer is proposed by Querlioz, Bichler and Gamrat in [34]. This layer functions

very similarly to the first layer.

At each cycle in the previously described algorithm, the input neurons send pulses to the

network. An output neuron can send a dual polarity spike back to the crossbar array. The

second layer of the network can sense this spike where the output neurons of the first layer

act as the input neurons of the second layer. If there is no spiking neuron in the first layer,

then the winner neuron with the highest state variable sends a spike to the supervised layer

(but not to the first layer). There are 10 output neurons of the second layer representing the

different digits. All input and output neurons of this layer is connected through a crossbar

array of ReRAM elements. The number of features in the data set (in this case the number of

31


digits) are known and will be used in training this layer. Thus, this layer will use supervised

training.

The initialization of the supervised layer is done in a similar fashion as the first layer. The

weights are randomly initialled between 0.49 and 0.51. The thresholds of output neurons are

set equally to a new threshold value found experimentally. This value is different than the

first layer’s threshold. This is because in the first layer, many neurons spike simultaneously

if the input pixel they are connected to is above a threshold. The currents that flow through

the output neurons are large. Since only one output neuron is named the winner (whether

it spikes or not), there will be high voltage on only one of the input lines of the second layer.

Thus, the current that will flow in this layer will be much less and this requires a new smaller

threshold to be used. Homeostasis property will not be used in the second layer. It is observed

that an increase in the thresholds result in a decrease in performance.

After the integration of currents by the output neurons of the supervised layer, a winner neuron

is chosen whether the winner spikes or not. If this neuron matches the digit shown to the

network, the spiking is counted as correct. This is how the accuracy calculation is performed

for this layer.

In this layer, the first output neuron is expected to spike when the first digit (0) is shown, the

second output neuron suppose to spike for second digit (1) and so on and so forth. Learning

is done in a way to ensure this. Regardless of which neuron spiked, a negative square pulse

is sent from the output neuron of the supervised layer that matches the digit shown. This

results in an increase of the weights with amount δw+ (Eq. 3.3) at the device where an input

spike is also present. This increment is done 3 consecutive times to converge to the desired

value faster. If there is a spiking output neuron and it spiked incorrectly, a positive square

pulse is sent. The weight of the ReRAM which has an input spike and an output spike will

be decremented by δw− (Eq. 3.4) 5 consecutive times. Another negative square pulse is sent

to the output neurons that shouldn’t have spiked with the exception of the spiking neuron

if there is such. The weights of the corresponding devices, which have a high voltage at the

input and received this pulse from the output, are decremented by δw− only once.

4.1.3 Multilayer Network with two Unsupervised Layers and Labeling

Another unsupervised layer is inserted between the unsupervised and the labeling layers. This

new layer functions exactly as the described unsupervised layer. The training of all layers are

done simultaneously.

The aim of using a multilayer network is to have an increase in accuracy by learning more

features with the help of the hidden layer.

32

4.2. Circuit Design and Modifications to the Learning Algorithm for Better HardwareCompatibility

4.2 Circuit Design and Modifications to the Learning Algorithm for

Better Hardware Compatibility

Circuit Design

To implement a single unsupervised layer in the network, certain circuit blocks need to be

implemented. The network receives input from the outside world. This can be realized using

a pre-pulse generator. This block can take the input pixels as input. According to the pixel

insanity, a thresholding is made here. Another input is the 20 Hz signal. The thresholded

binary intensity controls a switch which decides if the 20 Hz signal or no signal will be sent as

an output. There will be 784 such outputs that are connected to the crossbar array. This block

represents the input neurons.

. . . . . .

Pre -

Pulse

Generator

. . .

. . .

Dual

Polarity

Signal

Generator

Inte

gra

tor

Programmable

Threshold Generation

Comparator

Count and

Compare

Buffer

2nd layer of the network

Spike generation

Threshold increase

Figure 4.4 – Proposed circuit design for the learning algorithm.

The crossbar array consists of ReRAM devices at crossbars. The size of the crossbar array and

the number of ReRams are determined by the number of the output neurons. Since the circuit

complexity grows with increasing number of output neurons, a smaller network of size 784 x 10

or 784 x 50 can be chosen for early hardware experiments.

At the end of each column of the crossbar array lies a simple integrator. The voltage level

measured will be then sent to the comparator. The comparator, receiving the threshold

information from the programmable threshold generation block outputs a signal. Evaluating

all the output signals from all comparators in the network, a count and compare block decides

the winning line. The count for this line is incremented. A signal to the programmable

threshold generation block is sent if a threshold update is necessary. The information if there

will be a spike and to which line it belongs to is sent to the dual polarity signal generator.

This block creates the spike and sends it to the crossbar array. A switch is modulating the

33


connection between the array, the dual polarity signal generator and the integrator blocks. All

of these blocks represent the output neurons.

If there is a next layer in the network, the answer of the comparator is buffered and sent to the

second layer as input.

Modifications for Better Hardware Compatibility

The weights are stored as floating point numbers in the software simulations. However, this

precision cannot be maintained when using hardware elements. A way to simulate the discrete

resistance values of ReRAM devices is to divide the range of weights into discrete levels. Each

weight is then ensured to be at one of these levels. Having less number of levels makes it easier

to realise the algorithm with ReRAM because the range of weights between HRS and LRS will

be then divided into this certain number of levels. Each level will be assigned a larger range of

resistance, which is easier to maintain with unpredictable devices. For the precision of the

algorithm however, it is preferred to use a large number of levels because more levels is more

precision that can lead to a better performance.

When using discrete levels, the number of levels are inserted as a parameter to the network.

The weight range is then separated into levels. The number of levels should be selected by the

above mentioned tradeoff. Each increase and decrease of a weight means moving forward

or backward one step on the weight levels. This type of implementation allows saturation to

appear faster. This is because saturating to the highest or the lowest weight would require

less steps. Monitoring the weight change becomes of higher importance when developing a

learning algorithm with this technique.

Experiments are made for how to divide the weight into levels. One method for this is to have a

linearly divided set of weights. The range of weights is divided with equal distances to intervals.

The updating of weights with linear quantisation using 32 levels is shown in Fig. 4.5b. To

compare, standard approach of weights with continuous range is displayed in Fig. 4.5a. A

second method is to use the logarithmic scale. To distribute the weight more evenly, the union

of two logarithmic scales are used. In the first half, the levels are ranging from 0.0001 to 0.5

and divided into N2 intervals where N is the number of levels used. The second half is also

logarithmically divided into N2 intervals but the range is from 0.5 to 1. Merging two halves will

have a total range of 0.0001 to 1 as needed (Fig. 4.5c). The linearly divided levels perform more

similar to the continuous set of weights, and is preferred in this work over logarithmically

divided range.

The unpredictability of the ReRAM elements motivates the introduction of probability to

software simulations. Although levels of weights can be stored in software, the discrete weight

values might not be reached exactly using ReRAM devices after a weight update. A proposed

method for this is to have a small amount of random increase and decrease to the weights

after finding its new value from discrete levels. First, the level of the current weight is found.

34

4.2. Circuit Design and Modifications to the Learning Algorithm for Better HardwareCompatibility

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Weights before updating

Wei

ghts

afte

r upd

atin

g

(a) Weight update using continuous floatingpoint numbers.

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1


Wei

ghts

afte

r upd

atin

g

(b) Weight update using 32 linearly dividedquantized levels.

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1


Wei

ghts

afte

r upd

atin

g

(c) Weight update using 32 logarithmically di-vided quantized levels.

Figure 4.5 – Weight update using different quantization techniques (blue: increment, red:decrement).

Then, a random number is rolled within ±2.5%, ±25%, ±60%, ±100%, ±300% and ±900%

range of the targeted weight value of a synapse. The weight is updated to this new value and

this behaviour is driven by the probabilistic nature of ReRAM devices. It is to note that this

rolled number can be positive or negative so that the new weight value can both be above or

below the targeted level.

35

5 Simulation Results and Discussions

5.1 Simulation of the Single Layer Network

Accuracy of the Network

The learning accuracy of the single layer network is plotted in Fig. 5.1a. The iteration number is

the number of examples shown to the network during training. For this experiment, 3 epochs

of 60,000 training images are used from the MNIST set. The accuracy is given in percentage.

The diamonds at iteration number 180,000 correspond to the testing accuracy.

The accuracy is shown for four different network sizes. 10, 50, 100 and 300 output neurons

are used for this experiment. The network can classify 78.21 % of the training images and

82.93 % of the test images correctly with 300 output neurons using unsupervised learning. The

accuracies of all network sizes can be found in Table 5.1. As the network size increases, an

increase in accuracy is observed. More output neurons can capture more features from the

input images and images could be classified more correctly. With the increase in the network

size, the simulation time grows due to the network complexity.

Table 5.1 – Single layer network accuracies for different network sizes.

Network size 784 x 10 784 x 50 784 x 100 784 x 300

Training Accuracy 48.41% 69.92% 75.69% 78.21%Testing Accuracy 50.3% 75.46% 76.62% 82.93%

Another observation to be made from this plot is that as more examples are being shown to the

network, the accuracy curve saturates. This is because learning takes place at a higher pace

at the beginning where the weights are set randomly. As learning continues and the weights

converge to better values, the improvement to the accuracy decreases.

The results from [34] is presented for comparison (Fig. 5.1b). The trends of the curves are

very similar to each other. The accuracy in this work is lower than in [34] for all network

37

Chapter 5. Simulation Results and Discussions

sizes. An underlying reason might be different accuracy calculation methods. A more realistic

comparison would be made possible if data from both works could be evaluated using the

exact same methodology. Another reason for the accuracy different might arise from the

differences in the learning algorithm. In [34], the input spikes are generated using the pixel

intensities. However in this work, the input images are binarized with a simple thresholding

method and provided to the network via spikes. This might cause information loss and explain

the accuracy difference. Other differences in algorithms such as threshold updating rule or

different parameters used might have an effect on accuracy as well.

0 2 4 6 8 10 12 14 16 18x 104

40

50

60

70

80

90

100

Iteration number

Rec

ogni

tion

Rat

e [%

] or A

ccur

acy

[%]

10 output neurons50 output neurons100 output neurons300 output neurons

(a) Accuracy after training and testing.

This inhibition between the neurons can be implemented in a compact way through diffuser networks as in [28]. With this inhibition, the network is reminiscent of a Winner-Takes-All topology [27].

More precisely, when an output neuron spikes, the state variable V of the other output neurons is reset to zero during a time tinhibit.

4) Homeostasis: A final difficulty for the architecture is the adjustment of the neurons’ threshold. It is essential, but although this is natural for traditional artificial neuron networks, it is not for spiking neurons. A bioinspired original route is homoeostasis [2]. A target activity is defined for the neurons (i.e. a number of times an output neuron should spike over an extended period of time, like 100 digits presentation). Regularly the threshold of the neuron is increased if the average activity of the neuron is above the target, and decreased if it is below.

This ensures that all the outputs neurons are used and adjust the neurons’ thresholds to the stimuli for which they become specialized. In neuromorphic hardware, homoeostasis has been implemented with analog memories like in [34] or could be implemented digitally. The potential associated with this technology is evidenced in this paper in section III.B.

C. Simulations In this paper, all simulations are system-level and are

based on a C++ special purpose code. The code is event-based for simulation performance and runs on traditional CPUs. Simulation parameters as introduced above are ! = 100 ms, g = 1, Vth = 0.5 (normalized unit), inhibition time tinhibit = 10 ms, "# = 10-2, "$ = 5.10-3, minw = 10-4,

maxw = 1 (normalized units), %+ = 3.0, %- = 3.0. The width of the PRE pulses is 25 ms. Parameter variations are introduced around all the parameters using Gaussian random numbers (the value of their standard deviation is given in Section III). The initial weights are selected randomly around mid-range (0.5). The stimuli are applied using the coding schemes described in section III.C.

For demonstration of the concept, in this paper we use the widely studied case of handwritten number recognition. The

MNIST database is used, which consists in handwritten number of 28x28 pixels by 250 writers [11]. This database has been used as test for many learning algorithms.

In order to achieve learning, we present the full MNIST training database (60,000 digits) three times to the circuit. Each input neuron is connected with one pixel of the image. It emits spikes with a jittered rate that is proportional to the pixel intensity (maximum rate is 20 Hz). The initial phase is random. Every digit is presented during 350ms. No kind of preprocessing on the digits is used and the set is not augmented with distortions. The network is then tested on the MNIST test database (10,000 digits).

Fig. 3 plots the synaptic weights learned by the system in a configuration with only 10 output neurons. It is remarkable that without any supervision and using only our local custom STDP rule, the system has identified 9 (out of 10) different numbers, the real features of the input. Moreover it has learned the distinctive features of the digits (and not just the most likely handwriting): it has learnt the loop of the digit two, the bottom of the six, or the horizontal parts of three and eight.

In order to evaluate quantitatively the network

performance, Fig. 4 plots the recognition rate during the learning process for different numbers of output neurons. To evaluate the recognition rate, we associate output neurons with the digit for which they spike the most frequently a posteriori. In hardware this association could be performed with digital circuitry. This labeling could also be implemented using nanodevices in a variability compatible way as will be shown in section IV.

With ten output neurons the recognition rate reaches 60 %. With 50 output neurons it reaches 81% and with 300 output neurons 93.5 % (obtained with the same number of adjustable weights). A traditional artificial neural network with back-propagation with 300 hidden neurons reaches 95 % [11], which compares to our rate of 93.5 %. In the literature, the best algorithm has a largely superior 99.7 %

40

50

60

70

80

90

100

2 104 6 104 1 105 1.4 105 1.8 105

100

10 output neurons

50

300

Reco

gniti

on ra

te (%

)

Iteration number

Fig. 4. Recognition rate during learning for simulations with different numbers of output neurons (from bottom to top: 10, 50, 100, 300). The recognition rates are running averages on 10,000 iterations.

Fig. 3. Weights (conductances) learned in a simulation with 10 output neurons. Red is maximum weight, blue is minimum weight.

1778

(b) Accuracy after training in [34].

Figure 5.1 – Accuracy after training and testing in comparison to Querlioz, Bichler andGamrat.

38

5.1. Simulation of the Single Layer Network

Weight Initialization

The effect of four different weight initialization methods on accuracy is experimented on a

network of size 784 x 50. First, the weights are assigned a value very close to the middle of

allowed weight range. This mid-value is 0.5 as the weights are allowed to have values between

0.0001 and 1. The weights are assigned random values that can differ ±2% from 0.5, making

the initialization between 0.49 and 0.51. This type will be referred to as mid-centered. The

second type of initialization used is to randomly select the initial weight values from a normal

or Gaussian distribution. In the type of distribution, The number of weights that have a value

close to the mid-range is larger than the weights with distant values to the center. Thirdly,

weights are assigned randomly using uniform distribution where all values are assigned to

weights with the same probability. Finally, a binary distribution is applied. All the weights are

randomly assigned to the maximum or the minimum value of the weight range. These weight

distributions in all four cases are shown in Fig. 5.2a (left). The occurrence of each weight value

is limited to 3500 in the plot to be able to observe the distributions on a close up scale.

An important conclusion from this experiment is that regardless of which distribution chosen,

the weights after training is similar in all four methods 5.2a (right). The network choses to

decrease many of the weights to zero. This is expected as there are many black background

pixels that do not contribute to recognition of the digit.

Another result is that all these distributions show similar accuracies in training. An important

benefit of this is that a RESET followed by a SET operation would not be necessary in the

hardware implementation of the network. This would save time and energy. The testing accu-

racies, shown with diamonds at iteration number 180,000, have a larger difference compared

to training.

Visualisation of Weights after Training

The weights in the network contain information about which input neurons are important

for recognising the digits. When the network is shown an image, the column with the highest

current is a candidate for a spiking neuron. The highest current is generated when the weights

corresponding to the spiking input neurons are large. If a weight has a large value, the input

neuron that it is connected to has a large influence on the corresponding output neuron.

Thus, visualising the weights would give an idea about which input pixels from the dataset is

valuable for output neurons.

Each input neuron is connected to an image pixel and all 784 input neurons are connected

to each and every output neuron. A column of weights of an output neuron is reshaped so

that each weight corresponds to the importance of an image pixel voted by the network. 784

weights in a column are converted to a 28 x 28 image.

In a network with 10 output neurons, 10 output features are expected to be captured by the

network. After training a 784 x 10 network with 3 epochs of 60,000 examples, the weights are

39


0.2 0.4 0.6 0.8 1 0

2000

0.2 0.4 0.6 0.8 1 0

2000

0.2 0.4 0.6 0.8 1 0

2000

0.2 0.4 0.6 0.8 1 0

2000

0.2 0.4 0.6 0.8 1 0

2000

0.2 0.4 0.6 0.8 1 0

2000

0.2 0.4 0.6 0.8 1 0

2000

Weight value at Initialization

Occ

uran

ce

0.2 0.4 0.6 0.8 1 0

2000

Mid-Centered

Gaussian

Uniform

Binary

(a) Weights before and after training.

0 2 4 6 8 10 12 14 16 18x 104

40

50

60

70

80

90

100

Iteration number

Accu

racy

[%]

Mid−centeredGaussianUniformBinary

(b) Accuracies of a 784 x 50 network with differ-ent weight initializations

Figure 5.2 – Effects of different weight initialization techniques on accuracy.

reshaped and displayed in Fig. 5.3a. The output neurons that are connected to these weights

spiked for the digits 0 - 4 - 9 - 1 - 7 - 3 - 3 - 1 - 8 - 6. A connection between the digits and the

visualisation of weights is evident. The network could identify 8 of the digits presented. The

closest match that the network finds for the two remaining digits 2 and 5 are output neurons

10 (identifies the digit as 6) and 9 (identifies the digit as 8) respectively.

In a larger network where there are more output neurons than digits, the network is expected

to specialise in finer details. Several shapes and orientations of digits could be learned. The

weights of a 784 x 50 network connected to neurons that vote 0 and 6 as their digit choice are

shown in Fig. 5.3c.

40

5.2. Simulation of the Labeling Layer

0.2

0.4

0.6

0.8

1

(a) Weights of the 784 x 10 network.

This inhibition between the neurons can be implemented in a compact way through diffuser networks as in [28]. With this inhibition, the network is reminiscent of a Winner-Takes-All topology [27].

More precisely, when an output neuron spikes, the state variable V of the other output neurons is reset to zero during a time tinhibit.

4) Homeostasis: A final difficulty for the architecture is the adjustment of the neurons’ threshold. It is essential, but although this is natural for traditional artificial neuron networks, it is not for spiking neurons. A bioinspired original route is homoeostasis [2]. A target activity is defined for the neurons (i.e. a number of times an output neuron should spike over an extended period of time, like 100 digits presentation). Regularly the threshold of the neuron is increased if the average activity of the neuron is above the target, and decreased if it is below.

This ensures that all the outputs neurons are used and adjust the neurons’ thresholds to the stimuli for which they become specialized. In neuromorphic hardware, homoeostasis has been implemented with analog memories like in [34] or could be implemented digitally. The potential associated with this technology is evidenced in this paper in section III.B.

C. Simulations In this paper, all simulations are system-level and are

based on a C++ special purpose code. The code is event-based for simulation performance and runs on traditional CPUs. Simulation parameters as introduced above are ! = 100 ms, g = 1, Vth = 0.5 (normalized unit), inhibition time tinhibit = 10 ms, "# = 10-2, "$ = 5.10-3, minw = 10-4,

maxw = 1 (normalized units), %+ = 3.0, %- = 3.0. The width of the PRE pulses is 25 ms. Parameter variations are introduced around all the parameters using Gaussian random numbers (the value of their standard deviation is given in Section III). The initial weights are selected randomly around mid-range (0.5). The stimuli are applied using the coding schemes described in section III.C.

For demonstration of the concept, in this paper we use the widely studied case of handwritten number recognition. The

MNIST database is used, which consists in handwritten number of 28x28 pixels by 250 writers [11]. This database has been used as test for many learning algorithms.

In order to achieve learning, we present the full MNIST training database (60,000 digits) three times to the circuit. Each input neuron is connected with one pixel of the image. It emits spikes with a jittered rate that is proportional to the pixel intensity (maximum rate is 20 Hz). The initial phase is random. Every digit is presented during 350ms. No kind of preprocessing on the digits is used and the set is not augmented with distortions. The network is then tested on the MNIST test database (10,000 digits).

Fig. 3 plots the synaptic weights learned by the system in a configuration with only 10 output neurons. It is remarkable that without any supervision and using only our local custom STDP rule, the system has identified 9 (out of 10) different numbers, the real features of the input. Moreover it has learned the distinctive features of the digits (and not just the most likely handwriting): it has learnt the loop of the digit two, the bottom of the six, or the horizontal parts of three and eight.

In order to evaluate quantitatively the network

performance, Fig. 4 plots the recognition rate during the learning process for different numbers of output neurons. To evaluate the recognition rate, we associate output neurons with the digit for which they spike the most frequently a posteriori. In hardware this association could be performed with digital circuitry. This labeling could also be implemented using nanodevices in a variability compatible way as will be shown in section IV.

With ten output neurons the recognition rate reaches 60 %. With 50 output neurons it reaches 81% and with 300 output neurons 93.5 % (obtained with the same number of adjustable weights). A traditional artificial neural network with back-propagation with 300 hidden neurons reaches 95 % [11], which compares to our rate of 93.5 %. In the literature, the best algorithm has a largely superior 99.7 %

40

50

60

70

80

90

100

2 104 6 104 1 105 1.4 105 1.8 105

100

10 output neurons

50

300

Reco

gniti

on ra

te (%

)

Iteration number

Fig. 4. Recognition rate during learning for simulations with different numbers of output neurons (from bottom to top: 10, 50, 100, 300). The recognition rates are running averages on 10,000 iterations.

Fig. 3. Weights (conductances) learned in a simulation with 10 output neurons. Red is maximum weight, blue is minimum weight.

1778

(b) Weights of the 784 x 10 network in [34].

0.20.40.60.81

(c) Weights of ’0’s and ’6’s from the 784 x 50 network.

Figure 5.3 – Visualisation of weights after training.

5.2 Simulation of the Labeling Layer

After the unsupervised single layer, a supervised layer is added for labeling purposes. This

layer should sort the outputs of the first layer and thus, the accuracy of this layer and the

unsupervised layer is expected to be similar. The accuracies of both layers are presented in

Fig. 5.4 and in Table 5.2. The trends of the first and the second layer are very similar. However,

there is a difference of accuracy. This difference is larger than the difference presented in

[34]. The underlying reason might that although the second best matches to digits do not

contribute to the accuracy calculation of the first layer, they do have an impact in the labeling

layer because the labeling layer can still perform weight updates with the help of supervised

learning.

Table 5.2 – Labeling layer network accuracies for different network sizes.

Network size 784 x 10 784 x 50 784 x 100 784 x 300

Training Accuracy 60.22% 77.01% 80.95% 82.83%Testing Accuracy 50.8% 73.72% 75.08% 85.73%

41


0 2 4 6 8 10 1214 16 18x 104

0

25

50

75

100

Iteration number

Accu

racy

[%]

0 2 4 6 8 10 1214 16 18x 104

0

25

50

75

100

Iteration number

Accu

racy

[%]

0 2 4 6 8 10 1214 16 18x 104

0

25

50

75

100

Iteration number

Accu

racy

[%]

0 2 4 6 8 10 1214 16 18x 104

0

25

50

75

100

Iteration number

Accu

racy

[%]

UnsupervisedSupervised

(a) Network of size 784 x 10.

0 2 4 6 8 10 1214 16 18x 104

0

25

50

75

100

Iteration number

Accu

racy

[%]

0 2 4 6 8 10 1214 16 18x 104

0

25

50

75

100

Iteration number

Accu

racy

[%]

0 2 4 6 8 10 1214 16 18x 104

0

25

50

75

100

Iteration number

Accu

racy

[%]

0 2 4 6 8 10 1214 16 18x 104

0

25

50

75

100

Iteration number

Accu

racy

[%]


(b) Network of size 784 x 50.0 2 4 6 8 10 1214 16 18

x 104

0

25

50

75

100

Iteration number

Accu

racy

[%]

0 2 4 6 8 10 1214 16 18x 104

0

25

50

75

100

Iteration number

Accu

racy

[%]

0 2 4 6 8 10 1214 16 18x 104

0

25

50

75

100

Iteration number

Accu

racy

[%]

0 2 4 6 8 10 1214 16 18x 104

0

25

50

75

100

Iteration number

Accu

racy

[%]


(c) Network of size 784 x 100.

0 2 4 6 8 10 1214 16 18x 104

0

25

50

75

100

Iteration number

Accu

racy

[%]

0 2 4 6 8 10 1214 16 18x 104

0

25

50

75

100

Iteration number

Accu

racy

[%]

0 2 4 6 8 10 1214 16 18x 104

0

25

50

75

100

Iteration number

Accu

racy

[%]

0 2 4 6 8 10 1214 16 18x 104

0

25

50

75

100

Iteration number

Accu

racy

[%]


(d) Network of size 784 x 300.

Figure 5.4 – Accuracy with unsupervised and supervised layers for different network sizes.

5.3 Simulation of the Multilayer Network

The multilayer network of size 784 x 300 x 10 is simulated where the first and second layers are

unsupervised and the last layer is for supervised labeling (Fig. 5.5). The second unsupervised

layer produces poor accuracy and needs improvement in the future. To improve the poor

performance, adjustments to various network parameters such as the threshold of output

neurons should be experimented.

In the multilayer network, the weight updates of the first layer is done isolated from the second

layer. Generally, multilayer architectures can capture more features and can improve accuracy

with the help of the hidden layer. With this multilayer algorithm however, the benefits of the

multilayer structure is not present. Hence, a new multilayer network learning algorithm can

be developed as a future work.

42

5.4. Modifications to the Learning Algorithm for Better Hardware Compatibility

0 2 4 6 8 10 12 14 16 18x 104

0

20

40

60

80

100

Iteration number

Accu

racy

[%]

Unsupervised layer1Unsupervised layer2Supervised layer

Figure 5.5 – Multilayer network of size 784 x 300 x 10.

5.4 Modifications to the Learning Algorithm for Better Hardware

Compatibility

Accuracy of the Network After Quantization of Weights

The experiments with single layer network show that a training accuracy of 78.21% and a

testing accuracy of 82.93% can be reached with a 784 x 300 network. However, to build a circuit

using lossy circuit elements and unpredictable ReRAM devices requires some modifications

to the algorithm. One of these modifications is weight quantization. The software approach

uses floating point weights and increments which are hard to realize with ReRAM.

To have a more hardware-oriented approach, the network is allowed to have 32, 64 or 128

weight levels. Training is performed using 3 epochs of 60,000 training examples and the results

of this experiment are shown in Fig. 5.6. One of the findings is that the accuracy decreases as

the number of levels decrease. This is expected, since the currents flowing through devices are

affected more from weight updates if there are less number of weight levels. Thus, a single

increase or decrease of weights can effect the selection of the spiking output neuron. This

sensitivity plays a role in the accuracy decrease.

Another important result is that the quantization of weights has a smaller effect on the accuracy

as the network size increases. As more neurons and devices are present, the network can

tolerate more errors. This is due to the fact that many features of the same digit can be captured

by several neurons. Other features of the digit can still be detected even if one of the output

neurons can’t decide for the digit correctly.

43


0 2 4 6 8 10 12 14 16 18x 104

0

25

50

75

100

Iteration number

Accu

racy

[%]


0 2 4 6 8 10 12 14 16 18x 104

0

25

50

75

100

Iteration number

Accu

racy

[%]

0 2 4 6 8 10 12 14 16 18x 104

0

25

50

75

100

Iteration number

Accu

racy

[%]

0 2 4 6 8 10 12 14 16 18x 104

0

25

50

75

100

Iteration number

Accu

racy

[%]

(a) No quantization.

0 2 4 6 8 10 12 14 16 18x 104

0

25

50

75

100

Iteration number

Accu

racy

[%]


0 2 4 6 8 10 12 14 16 18x 104

0

25

50

75

100

Iteration number

Accu

racy

[%]

0 2 4 6 8 10 12 14 16 18x 104

0

25

50

75

100

Iteration number

Accu

racy

[%]

0 2 4 6 8 10 12 14 16 18x 104

0

25

50

75

100

Iteration number

Accu

racy

[%]

(b) 128 quantization levels.0 2 4 6 8 10 12 14 16 18

x 104

0

25

50

75

100

Iteration number

Accu

racy

[%]


0 2 4 6 8 10 12 14 16 18x 104

0

25

50

75

100

Iteration number

Accu

racy

[%]

0 2 4 6 8 10 12 14 16 18x 104

0

25

50

75

100

Iteration number

Accu

racy

[%]

0 2 4 6 8 10 12 14 16 18x 104

0

25

50

75

100

Iteration number

Accu

racy

[%]

(c) 64 quantization levels.

0 2 4 6 8 10 12 14 16 18x 104

0

25

50

75

100

Iteration number

Accu

racy

[%]


0 2 4 6 8 10 12 14 16 18x 104

0

25

50

75

100

Iteration number

Accu

racy

[%]

0 2 4 6 8 10 12 14 16 18x 104

0

25

50

75

100

Iteration number

Accu

racy

[%]

0 2 4 6 8 10 12 14 16 18x 104

0

25

50

75

100

Iteration number

Accu

racy

[%]

(d) 32 quantization levels.

Figure 5.6 – Accuracy using different quantization levels with a network of size 784 x 50.

Visualisation and Distribution of Weights after Training

The weights are reshaped and displayed for different weight quantization levels along with

their distributions (Fig. 5.9). One observation is that the number of weights with minimum

value increased when quantization is done on weights. The reason for this is saturation to

limits is easier as the number of steps to take decrease. As there are more weights decreased

than increased in the algorithm due to the large number of background pixels, saturation to

minimum weight is more likely to happen.

Perturbations to the Weight Update

The real ReRAM devices wouldn’t perform the weight updates as precisely as software does. To

observe how much error in weight updates can be tolerated, the experiment with perturba-

tions to weight update is performed. In regular weight updates using quantized layers, the

weight moves a step up or down. In this experiment, the weight level closest to the weight is

determined and the weight is assumed to be at this weight. This current weight level is shown

in blue in Fig. 5.7. The targeted weight is determined. Assuming that a weight increase is in

44

5.4. Modifications to the Learning Algorithm for Better Hardware Compatibility

progress, the aimed level is the red level. A random number is rolled within a certain range

of the aimed weight and the update is distorted by that amount. The possible values that

the weight can end up with is shown in green. If this quantized level variation is larger than

±50%, then the weight update may result in a higher or lower weight level than expected when

rounded to the closest weight in the next cycle. For example, if a random number is rolled

within ±100% quantized level variation, there is 50% chance that the weight will pass to the

range of an upper or lower level according to the random number rolled.

The results are shown in Fig. 5.8. The network can tolerate a large amount random of weight

perturbations. This property of the network will be very useful in the hardware experiment as

the weight updates in the ReRAM devices will be imprecise.

Level 9

Level 8

Level 7

Level 6

current level

targeted level

possible weight values after update

Figure 5.7 – Illustration of probabilistic quantized weight variation.

0 2 4 6 8 10 12 14 16 18x 104

0

20

40

60

80

100

Iteration number

Accu

racy

[%]

0% quantized weight variation2.5% quantized weight variation25% quantized weight variation60% quantized weight variation100% quantized weight variation150% quantized weight variation900% quantized weight variation

Figure 5.8 – Quantized weight variations in a network of size 784 x 50 with 64 quantizedlevels.

45


0.2

0.40.6

0.81

(a) Weights, no quantization.

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0

1000

2000

3000

4000

5000

6000

7000


Occ

uran

ce

(b) Distribution of weights, noquantization.

0.2

0.40.6

0.81

(c) Weights, 128 quantization lev-els.

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0

1000

2000

3000

4000

5000

6000

7000


Occ

uran

ce

(d) Distribution of weights, 128quantization levels.

0.2

0.40.6

0.81

(e) Weights, 64 quantization levels.

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0

1000

2000

3000

4000

5000

6000

7000


Occ

uran

ce

(f) Distribution of weights, 64quantization levels.

0.2

0.40.6

0.81

(g) Weights, 32 quantization levels.

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0

1000

2000

3000

4000

5000

6000

7000


Occ

uran

ce

(h) Distribution of weights, 32quantization levels.

Figure 5.9 – Weights and distribution of weights after training a network of size 784 x 50.

46

6 Conclusion and Future Work

In this thesis, a hardware-compatible bio-inspired learning algorithm using an artificial neural

network for a hybrid-CMOS memristive system is developed. The network will use CMOS neu-

rons and ReRAM devices arranged in a crossbar array. The learning algorithm is pulse-based,

inspired by the biological learning process in the human brain. The proposed architecture is

non-Von Neumann, distributing the computation, memory and communication to smaller

units.

First, a simplified single layer network is presented based on an existing algorithm in literature.

With a network of size 784 x 300, a training accuracy of 78.21% and a testing accuracy of 82.93%

is achieved. The impact of different weight initialisations on accuracy is experimented and it

is concluded that different distributions do not have a large impact on final accuracy. Thus,

performing RESET and SET operations would not be necessary in the hardware experiment. A

labeling layer is added and the performance of both layers are compared. The accuracy curve

trends of both layers are very similar but the accuracy of the labeling layer is slightly higher.

Next, a multilayer network implementation is proposed and simulation results are provided.

The multilayer network needs further improvement. To increase the accuracy of the added

unsupervised layer, optimal network parameters can be experimented. However, because the

training in all layers are done separately without any interactions with other layers, a high

accuracy might not be reached. To further improve the accuracy, a new multilayer training

algorithm which can merge the training procedure of different layers could be developed in

the future.

An important part of this work is the modifications on the learning algorithm for hardware

compatibility. Instead of using a continuous range of weights, weights consisting of 128, 64

and 32 levels are examined. As number of levels decrease, the accuracy of the network also

decreases. However, it is preferable to have less number of levels in ReRAM devices for better

reliability. So the number of levels should be selected to balance the accuracy expectations and

physical implementation challenges. Another important aspect of this work was to introduce

probabilistic weight updates. It has been shown that the network tolerate imprecise weight

47

Chapter 6. Conclusion and Future Work

updates when using quantized weight levels.

A possible next step is to build the hybrid-CMOS memristive system using the circuit topology

provided in this thesis. For this, detailed circuit simulations should be performed using

accurate ReRAM models and effects of leakage current, parasitic capacitances and inductances

in the crossbar array should be examined. Adding transistors or access devices to the crossbars

with ReRAM can be experimented. The feasibility of realizing large networks using ReRAM

can be further analyzed.

New learning methods can be developed to improve the network performance. A supervised

single layer network can be developed using the same learning principles as the labeling layer.

The performance of this network can be compared with the unsupervised single layer network

of this thesis.

48

Place and Date:

——————————————————–

Signature:

——————————————————–

49

Bibliography

[1] L. F. Abbott and S. B. Nelson. Synaptic plasticity: taming the beast. Nature neuroscience,

3:1178–1183, 2000.

[2] A. Afifi, A. Ayatollahi, and F. Raissi. Implementation of biologically plausible spiking

neural network models on the memristor crossbar-based cmos/nano circuits. In Circuit

Theory and Design, 2009. ECCTD 2009. European Conference on, pages 563–566. IEEE,

2009.

[3] H. Akinaga and H. Shima. Resistive random access memory (reram) based on metal

oxides. Proceedings of the IEEE, 98(12):2237–2251, 2010.

[4] J. A. Anderson. An introduction to neural networks. MIT press, 1995.

[5] G.-q. Bi and M.-s. Poo. Synaptic modifications in cultured hippocampal neurons: de-

pendence on spike timing, synaptic strength, and postsynaptic cell type. The Journal of

neuroscience, 18(24):10464–10472, 1998.

[6] T. V. Bliss and T. Lømo. Long-lasting potentiation of synaptic transmission in the dentate

area of the anaesthetized rabbit following stimulation of the perforant path. The Journal

of physiology, 232(2):331–356, 1973.

[7] R. E. Brown and P. M. Milner. The legacy of donald o. hebb: more than the hebb synapse.

Nature Reviews Neuroscience, 4(12):1013–1019, 2003.

[8] G. W. Burr, R. M. Shelby, C. di Nolfo, J. W. Jang, R. S. Shenoy, P. Narayanan, K. Virwani,

E. U. Giacometti, B. Kurdi, and H. Hwang. Experimental demonstration and tolerancing

of a large-scale neural network (165,000 synapses), using phase-change memory as the

synaptic weight element. In Electron Devices Meeting (IEDM), 2014 IEEE International,

pages 29–5. IEEE, 2014.

[9] L. Chua. Resistance switching memories are memristors. Applied Physics A, 102(4):765–

783, 2011.

[10] L. O. Chua. Memristor-the missing circuit element. IEEE Transactions on Circuit Theory,

18(5):507–519, 1971.

51

Bibliography

[11] L. O. Chua and S. M. Kang. Memristive devices and systems. Proceedings of the IEEE,

64(2):209–223, 1976.

[12] K. Fukushima. Neocognitron: A self-organizing neural network model for a mechanism of

pattern recognition unaffected by shift in position. Biological cybernetics, 36(4):193–202,

1980.

[13] W. Gerstner and W. M. Kistler. Spiking neuron models: Single neurons, populations,

plasticity. Cambridge university press, 2002.

[14] M. H. Hassoun. Fundamentals of artificial neural networks. MIT press, 1995.

[15] D. O. Hebb. The organization of behavior: A neuropsychological approach. John Wiley &

Sons, 1949.

[16] A. L. Hodgkin and A. F. Huxley. A quantitative description of membrane current and its

application to conduction and excitation in nerve. The Journal of physiology, 117(4):500–

544, 1952.

[17] J. J. Hopfield. Neural networks and physical systems with emergent collective com-

putational abilities. Proceedings of the national academy of sciences, 79(8):2554–2558,

1982.

[18] D. Hubel and T. Wiesel. Functional architecture of macaque monkey visual cortex. ferrier

lecture proc. In Roy. Soc. Lond. B, volume 198, pages 1–59, 1977.

[19] D. H. Hubel and T. N. Wiesel. Receptive fields, binocular interaction and functional

architecture in the cat’s visual cortex. The Journal of physiology, 160(1):106–154, 1962.

[20] D. H. Hubel and T. N. Wiesel. Receptive fields and functional architecture in two non-

striate visual areas (18 and 19) of the cat. Journal of neurophysiology, 28(2):229–289,

1965.

[21] Human Brain Project. https://www.humanbrainproject.eu, 2013.

[22] G. Indiveri and T. K. Horiuchi. Frontiers in neuromorphic engineering. Frontiers in

neuroscience, 5, 2011.

[23] Y. LeCun. Learning processes in an asymmetric threshold network. In E. Bienenstock,

F. Fogelman-Soulié, and G. Weisbuch, editors, Disordered systems and biological organi-

zation, pages 233–240, Les Houches, France, 1986. Springer-Verlag.

[24] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner. Gradient-based learning applied to docu-

ment recognition. Proceedings of the IEEE, 86(11):2278–2324, 1998.

[25] J.-S. Lee. Progress in non-volatile memory devices based on nanostructured materials

and nanofabrication. Journal of Materials Chemistry, 21(37):14097–14112, 2011.

52

https://www.humanbrainproject.eu

Bibliography

[26] H. Markram, J. Lübke, M. Frotscher, and B. Sakmann. Regulation of synaptic efficacy by

coincidence of postsynaptic aps and epsps. Science, 275(5297):213–215, 1997.

[27] W. S. McCulloch and W. Pitts. A logical calculus of the ideas immanent in nervous activity.

The bulletin of mathematical biophysics, 5(4):115–133, 1943.

[28] C. Mead. Neuromorphic electronic systems. Proceedings of the IEEE, 78(10):1629–1636,

1990.

[29] C. A. Mead. Analog VLSI and neural systems. Addison Wesley Publishing Company, 1989.

[30] J. S. Meena, S. M. Sze, U. Chand, and T.-Y. Tseng. Overview of emerging nonvolatile

memory technologies. Nanoscale research letters, 9(1):1–33, 2014.

[31] P. A. Merolla, J. V. Arthur, R. Alvarez-Icaza, A. S. Cassidy, J. Sawada, F. Akopyan, B. L.

Jackson, N. Imam, C. Guo, Y. Nakamura, et al. A million spiking-neuron integrated circuit

with a scalable communication network and interface. Science, 345(6197):668–673, 2014.

[32] M. Minsky and S. Papert. Perceptrons. MIT press, 1969.

[33] D. B. Parker. Learning logic. Technical Report TR-87, Center for Computational Research

in Economics and Management Science, MIT, Cambridge, MA, 1985.

[34] D. Querlioz, O. Bichler, and C. Gamrat. Simulation of a memristor-based spiking neural

network immune to device variations. In Neural Networks (IJCNN), The 2011 International

Joint Conference on, pages 1775–1781. IEEE, 2011.

[35] R. Rojas. Neural networks: a systematic introduction. Springer Science & Business Media,

2013.

[36] F. Rosenblatt. The perceptron: a probabilistic model for information storage and organi-

zation in the brain. Psychological review, 65(6):386, 1958.

[37] D. E. Rumelhart, J. L. McClelland, and PDP Research Group. Parallel distributed process-

ing, volume 1, 2. Cambridge, MA: The MIT Press, 1986.

[38] D. Sacchetto, P.-E. Gaillardon, M. N. Zervas, S. Carrara, G. De Micheli, and Y. Leblebici. Ap-

plications of multi-terminal memristive devices: A review. Circuits and Systems Magazine,

IEEE, 13(2):23–41, 2013.

[39] J. Sandrini, M. Thammasack, T. Demirci, P.-E. Gaillardon, D. Sacchetto, G. De Micheli,

and Y. Leblebici. Heterogeneous integration of reram crossbars in 180nm cmos beol

process. Microelectronic Engineering, 145:62–65, 2015.

[40] A. Sawa. Resistive switching in transition metal oxides. Materials today, 11(6):28–36,

2008.

53

Bibliography

[41] P. Sheridan, W. Ma, and W. Lu. Pattern recognition with memristor networks. In Circuits

and Systems (ISCAS), 2014 IEEE International Symposium on, pages 1078–1081. IEEE,

2014.

[42] A. A. Stocker. Analog VLSI circuits for the perception of visual motion. John Wiley & Sons,

2006.

[43] D. B. Strukov, G. S. Snider, D. R. Stewart, and R. S. Williams. The missing memristor found.

nature, 453(7191):80–83, 2008.

[44] R. S. Sutton and A. G. Barto. Reinforcement learning: An introduction, volume 1. MIT

press Cambridge, 1998.

[45] T. Trappenberg. Fundamentals of computational neuroscience. OUP Oxford, 2009.

[46] P. Werbos. Beyond regression: New tools for prediction and analysis in the behavioral

sciences. PhD thesis, Harvard University, Cambridge, MA, 1974.

[47] B. Widrow and M. E. Hoff. Adaptive switching circuits. WESCON Convention Record,

pages 96–104, 1960.

[48] S. Wozniak, A.-D. Almási, V. Cristea, Y. Leblebici, and T. Engbersen. Review of advances in

neural networks: Neural design technology stack. In Proceedings of ELM-2014 Volume 1,

pages 367–376. Springer, 2015.

54

Documents

HYBRID CMOS-MEMRISTIVE NEUROMORPHIC SYSTEMS MODELING AND ... · PDF fileHYBRID CMOS-MEMRISTIVE NEUROMORPHIC SYSTEMS MODELING AND ... experience on neural networks and giving me