MID-PROTOTYPING OF ARTIFICIAL NEURAL NETWORKS · MID-PROTOTYPING OF ARTIFICIAL NEURAL NETWORKS BY ROGER &W. NG A Thesis ... struct vexy large neural networks us- FPGA technology

MID-PROTOTYPING OF ARTIFICIAL NEURAL NETWORKS

BY

ROGER &W. NG

A Thesis Subrnitted to the Facuity of Graduate Studies in Partial Fuifiliment of the Recphments

for the Degree of

MASTER OF SCIENCE

Department of Electncal and Cornputter Engineering University of Manitoba

Wepeg, Canada

National Library Bibliothèque nationale du Canada

Acquisitions and Acquisitions et Bibliographie Services services bibliographiques

395 Wellington Street 395, nie Wellington OctawaON K 1 A W W O N K 1 A W Canada Canada

The author has granted a non- exclusive licence dowing the National Ll'brary of Canada to reproduce, loan, distri'bute or sen copies of this thesis in microform, paper or electronic formats.

The author retahs ownership of the copyright in this thesis. Neither the thesis nor substantial extracts fkom it may be printed or otherwise reproduced without the author's permission.

L'auteur a accordé une licence non exclusive permettant a la Bibliothèque nationale du Canada de reproduire, prêter, distribuer ou vendre des copies de cette thèse sous la forme de microfiche/nlm, de reproduction sur papier ou sur format électronique.

L'auteur conserve la propriete du droit d'auteur qui protège cette thèse. Ni la thèse ni des extraits substantiels de celle-ci ne doivent être imprimés ou autrement reproduits sans son autorisation.

A Thesis submitted to the F a d t y of Graduate Studics of the Univezsity of Manitoba in partid fulfillmcnt of the nqPircments of the degree of

Pamission has b e n granted to the UBRARY OF TEE UNIVERSITY OF MANITOBA to lend or d i copies of this thesis, to the NATIONAL LIBRARY OF CANADA to microfilin this thesis and to Icnd or dl copies of the nIm, and UBRARY MICROFILMS to publish an abstrrct of this thesis.

The author -es otha publiation rights, and natha the thesis nor extensive m e t s from it -y be pdnted or 0th-wise reproâuced without the authods written pumirsion.

I hereby declare that I am the sole author of this thesis.

I authorize the University of Manitoba to lend this thesis to other institutions or

individuals for the purpose of scholarly research.

Roger K W. Ng

1 fiirthermore authorize the University of Manitoba to reproduce this thesis by

photocopying or by other means. in total or in part. at the request of other

institutions or individuais for the purpose of scholariy research.

Roger K W. Ng

The University of Manitoba requires the signatures of ail persons using or

photocopying this thesis. Please sign. and give address and date.

ABSTRACT

This thesis explores Field Programmable Gate Grray [FPGA) implemenbtlons of

artIficial neural networks employhg pulse-code arithmetic. Pulse-code arithme-

tic uses values encoded as probabiiitic pulse streams. Artifidai neural networks

employing pulse-code arithmetic require only simple digital logic gates to per-

form multiplication and addition which are the essential operations for these

networks. As such, pulse-code techniques offer considerable potential to con-

struct vexy large neural networks us- FPGA technology. The imp1ementation

results presented in this thesis show that each neuron and synapse element use

an average of 23 CLBs on Xilinx XC4000 series FPGAs for the XOR pmblem. One

of the advantages of using FPGAs for implementing neural networks is that they

aUow the overall network to be easily modifieci or replaced, by simply download-

ing new circuity. in addition, this thesis describes a top-dom design flow meth-

odology h m abstracted simulation through to implementation in Xilinx FPGAs.

The use of top-down design and FPGA technologies shorten the overall develop-

ment cycle. In addition, as these networks are extmmely compact there is the

potential for experimentation during prototyping.

ACKNOWLEDGEMENTS

When in your Me seems to be leading you towards some unknown yet

strangely faxdiar conclusion and its not un611 you reach the end that you realize

t h t coincidence has been conspiring to guide you into th& parücular window of

the eternal now. My three-year graduate studies was sort of ïike Ut- So thRnks

to ail the guides and earth angels who genemusly contributecl and supported

throughout my graduate studies.

1 would like especially thank m y advisor* Professor Robert McLeod for his advice,

encouragement. and assfstance throughout this thesis.

I also would iike to aclmowledge ail the members of the VLSI Laboratory at the

University of Manitoba. specifically Hart Poskar, Dean McNeill. Richard Wieler

and Ken Ferens. in addition, 1 wodd iike to acknowledge the support of

Cornputer Senrtces at University of Manitoba.

Support provided by the Naturai Sciences and Engineering Research Co& of

Canada, the Fedezal Government's Centers of Excellence Micronet Pro- and

the Canadian MicrOelectronics Corporation îs highly appreciated.

TABLE OF CONTENTS

CHAPTER 1

Introduction ................................................. 1 ...................................... ArüfldalNeuralNetwork 3

S v .................................................. 6

Org~tionoftheThesis ..................................... 7

CHAPTER 2

.................................... Pulse4ode Neural Networks 8

.......................... Fundamentals of Pulse-Code Arithmetic - 9 .................................... Puise Stream Addition 11

................................ Pulse Stream Multipffation 13 .................................. Pulse Stream Generation 13

.................................. Pulse-Code Neural Network -17 EUse-code F e e d f o d Neural Network .O..................,.. 18

......................... I h i d n g Pulse-Code Neural Networks 20

TABLE OF CONTENTS

Simulation of Pul2)8.Icoâe Neural Networks ........................ 24

Simulation Environment .................................... - 2 4

................................ Cheque Character Recognition -31

Implementation of PuIse-Cade Neural Netwocks in Xilinx FPGAs ...... - 35

FPGAs Implementation ...................................... 35 Modirlat Design of Pulse Stream Neural Networks .............. -35

................................... Neuron Synapse Units -37 ......................................... Re-randomizer -38

WeightResolution ....................................... 39

................ Overview of Xiïinx Field Programmable Gate Arrays - 4 0

.............. Design Flow of Pulse Code N e d Network Haràware - 4 3 ...................................... DatabaseStructure 45

............................ Xerion Neural Networksimulator 46 VHDLCode ............................................ 46

............................... Synthesis and O p m t i o n - 4 8 ....................................... Design Vexification 49

................................... Neocad FFGA Foundry - 5 0

..................................... FFGAs Design Examples 51 ........................................... XOR Ploblem 51

........................................ Encoder Roblem 55 ............................ Cheque Character Recognion - 5 6

TABLE OF CONTENTS -- - -

CHAPTER 4

Conclusions and Future Work ..........................O......- - 5 9

APPENDlX A

.......................... Targethg VHDL ûesign to Xllhx FPGAs - 6 2

.................................... Wait for XX ns Statement 63

....................................... AfterXXnsStatement 63

.............................................. rnitialVaIues 64

......................... Order and Group Arithmetic Functions -64

.................................... Xilinx Name Co~ll~entions -64

...................................... Latches and Registers - 6 5

.................. Implernenting Multiplexers with Tkïstate Bu&rs -67

APPENDlX 6

................................. Xilinx Oeviœ Quick Refemnces -71

LIST OF FIGURES

Figure 1.1:

Figure 2.1:

Figure 2.2:

Figure 2.3:

Figure 2.4:

Figure 2.5:

Figure 2.6:

Figure 2.7:

Figure 3.1:

Figure 3.2:

Figure 3.3:

Figure 3.4:

Figure 3.5:

Figure 3.6:

Figure 3.7:

Figure 3.8:

Multi-layer feedfolward network ....................... - 4

Random pulse sequeme .............................. 10

Unipolar pulse stream repmsentatio on, .................. - 1 1 OR gate addition ................................... -12 Example of the OR-gate addition and AND-gate multiplication . . 13

Block diagram of a pulse stream generation .............. -15 Rate multiplier schematic ............................ -17 Negative and positive weight, ......................... -20 The user interface of the pulse-code neural network simulator -25

................. Training evolution of the XOR probl exn. -26

The impact on dtvision on network training. ............... 28

................... Training for the four bit parity problem 29

Training for 8-6-8 encoder ............................ 31 Sample chcque ..................................... 32

.............. Training for the cheque character recognition 33

Input activation for the cheque characte r recognition problem . 33

UST OF FIGURES

Figure 4.1. The top-levd of pulse stream neural networks ............. 36

Figure 4.2. Neuron Synapse Unit ............................... -37 Figure 4.3. Block diagram of the re-randomizer drcuit- ............... 38

Figure4.4: XillnxFPGAsarchitecture . . ,. ........................ - 4 1

F ï 4 . 5 . XC4000CLB ...................................... 42

Figure 4.6. EUse-code neural network design process ..,............ -44

Figure 4.7. Database organization ..................,........... -45

Figure 4.8. VHDL testbenches .................................. 49

Figure 4.9. XOR simulation results ............................. - 5 2

. ...................... Figure 4 IO: The FPGA layout of XOR problem -53

................ Figure 4.11. The timing report of theXORFPGAdesign 54

...................... Figure 4-12: The FPGA layout of 5 4 5 encoder 55

........... Figure 4.13. The FPGA layout of cheqye character recognition 57

FigureA.1. Latchinference .................................... 66

......................... Figure k 2 : Lat& implemented with gates 66

.................... Figure A.3. Implementtip 5-to- 1 MUX with gates 68

.................... Figure A.4: 5-to- 1 MUX iniplemented with gates -69

.................. Figure AS: Implementing 5-to-1 MUXwith BUFTs -69

................... Figure A6: 5-10-1 MUX implemented with BUFTs 70

LIST OF TABLES

.............. Table2.1. C o ~ l ~ t r u c t i o n o f C A w i t h ~ c y c l e l e n g t h -16

..................... Table A. 1: D latch implementation cornparison -67

................ Table 8.1. XilinxDevices. Packages andspeed Grades -71

Chapter 1

Introduction

Aïmost evplrvthinR in the field of neural networks has been done by simu-

lating the netmorks on serial computers. There has been comparatively liffle

study of hardware implementations. General purpose computers are not opti-

mized for neural network calculations: they require specialized hardware in

order to utilize the iaherent parallellsm. The alternative approach is to build

spechi hardware for neural ne0worIcs on a single chip or multi-chip system. A

neural network architecture can be implemented as an integrated circuit using

analog. digital. or mixed nnsiiog/digital structures. nie analog circuit^^ permis

high density implementation as the multiplication is based on rnodiiied Gilbert

multipIiers (1) and sunumation on Kirchhoff s current Iaw. However. d o g hard-

ware does not produce high accuracy arithmetïc and the dorage of analog

weight values re~uired for the synapse is drnicult. On the other hand, the digital

hardware can perform arithmetic operations with a bigh degree of accuracy and

the storage of the we5ght values is easy in the digital form. Nso. digital hardware

can take advantage of some of the benefits of current VLSI technology. such as

weii understocxi and advanceci design techniques. as w d as prototying in Fiefci

Prognunmabie Gate Array FPGA technologies. Howewer one of the major con-

straints of digital implemcntatïons of neural networks is the amount of circuity

required to perform the multiplication, This problem is especially acute in high

speed digital des-. where paralleeï multipïiers are errtremeïy expenskve in

terms of cirCUIty. Adopting an equivalent bit serial architecture sienificantiy

reduces thls complerdty. but stiil tends to resuit in large and complex designs. In

addition a single multiplier would consume a significant proportion of a current

state of the art FPGA. thus making the use of such devices impractical for this

approach.

This thesis describes an alternative neural network architecture which may be

implemented using standard VLSI technology. but also maps extrema effi-

ciently to FPGAs such as those of Xilinx [2]. The central idea is to represent the

real-valueci signais passing between neurons using encodeci binary pulse

streams. Pulse streams arithmetic req, O* simple digital logic gates to per-

form multiplication and addition. The main advantage of such an approach is

structural simplidty of the arfificiai synapse and neUron, comparable to analog

implementations. thus aiiowîng vny -dent space usage of the fme grained

FPGAs.

The work presented in this thesis examines pulse-code neural network imple-

CHAPTER 1 - lntroâuctbn

mentations on FPGAs using top-down design methodology, In the remaïnder of

this chapter background material of neural networks is presented which wili

serve to familiarize the reader with some of the general concepts, Thfs will help

estabhh a common reference h m which to base the discussions in the later

chaptets.

1.1 Artif icial Neural Network

Modern neural network theories can be traced backa d to ideas first intro-

duced in the 1940s and 1950s. in 1943. McCulloch and Pitts proposed a simple

model of navon operation [3j. This model attracted much interest because of its

simplicity. in the late 1950s. Rosenblatt developed networks that could leam to

recognize simple patterns t4]- The perceptron, as it was d e d , couid decide

whether an input belonged to one of two ciasses. A single neuron would compute

the weighted sum of bhary-type Inputs. subtract a tlueshold and pass the

result through a non-linear hard litnithg threshold that classifiecl the input. in

1969. Minsky and Papert [5] showed that a s& class of perceptrons could not

perform certain tasks in pattern recognition. The simple example is the exrfusiue

or (XOR] problem: a single output neuron is qgired to turn on if one or the

other of two input Unes is on. but not when neither or both inputs are on, 'ïhey

believed that structures with more layers of neurons could solve the prob1em.

but they could not find a learning rule ta train a multi-layer network With this

roadblocb. researchers left the neural network paradigm for almost 20 years. I t

CHAPTER 1 - lntioduction

was not until 1986, when Rumehart. Hinton and Williams introduced a new

learning algorithm. known as backpropagation (61 to the problem of the nemorks

âiscussed in Perceptrons [5] that neural networks regained the& popularity.

Neural networks are usu- characterlzed by the way in which neurons are

interconnecteci. There are two mjor classes of neural network topologies: mulli-

lagr feedfoonuanl networks and feedback networks, Feedback networks are

beyond the scope of tbis thesis which will focus on multi-layer feedforward net-

works on&. The general form of the multl-layer fedorward network consists of

an input layer, one or more hiciden layers, and an output layer of n m n s (see

Figure! 1.1)-

Output Layer

a e a a Hidden Layer(s)

Input Layer

Figure 1.1 : Muhi-layer feedfomard network.

Complete bipartite graphs are constructeci between each adjacent layer of neu-

-- -

CHAPTER 1 - Introduction

rons using weighted comctions. Data is presented to the input layer. each is

multiplied by a weight and summed at neurons of the connecting layer. These

weighted sums are then passed through a nxmsible nonïinear transferficncffon

forming the input to the folio- layer- This procedure is repeated until reach-

ing the output layer thereby complethg a fornard passe

Without a program of instructions a computer is a useless machine. The pro-

gram usuafhr instructs the computer to perform specific tasks on a set of input

data to create some sort of output. Therefore, the program is an essential part in

a computer environment. Neural networks are not programmed in the conven-

tional sense. they are Wht. Teaching a neural network cognitive knowledge is

basicaüy a modification of the synaptic weights according to some learning algo-

rithm or rule. nierefore. the knowledge or -program" of a neural network is in

the weights. There are two main types of learning rule: mpemised learning and

unsuperviseci learning. In supenrised learnlag, an example set of input/output

pairs is necessary, and the error between the actual response and the target

response îs used to correct or modify the network. in contrast to supervised

learning, unsupenrised learninq is not @en any information about whether its

outputs are right or wrong. Instead. the network must decide what characteris-

tics of the training set are relevant, and modlfjr the weights in order to extract

those features. This thesis wiil focus on the s u e learning neural network

A popular supervised leaming algorithm is the baclpmpugation leamhg ulgo-

rUhn which was introduced by Rumelhart. Hinton and Williams in 1986 16).


The backpropagation learning algorithm involves the presentation of a aaining

set of input/output pattern pairs. The objective is to [ind a set of weights that

ensures that the output produceci by the network is the same as, or close to. the

target output pattern for each of the input patterns. During the backwcd pass

Oearning phase). the actual output Is compareci to the target output and an error

vector is created. These m m are then backpmpugated through the network

modi3ring the connection weights according to an iteratitre grnAt'clnt descent aigo-

rithnt After many iterations of the training set the connection weights settle to a

local minimum of the output error over the training set. Better miriima may be

found repeated trainhg with randomiy selected initial coxmection weights.

2.2 Sumrnary

This chapter has provided a quick overview of the advantages and disad-

vantages of analog and digital implementaffons of neural networks. In order to

implement these networks in FPGAs. the area of the circuity is the most impor-

tant criteria. The use of pulse stream arithmetic was proposed as it allows low

area implementation of the hardware required to perform the arithmetic for neu-

ral networks. A brief history of neural networks. neural network topologies and

learning d e s was overviewed. Multi-layer feedforward networks and the learn-

ing algorithm known as Backpropagation was a h discusseâ.


2.3 Organization of the Thesis

The II& chapter discusses the implementation of pulse-coded neural

networks. It begins wi th the fundamentais of pulse stream arithmetic foiiowed

by a discussion of app- pulse stream arithmetic to neural networks. Chapter

3 presents the software simulation of these networks. Chapter 4 describes the

design process of the puise-coded neural networks ont0 Xilinx FPGAs. F W y

conciusions are d r a . and proposab for future work iç presented.

Chapter 2

Pulse-Code Neural Networks

Pulse-code neural networks use pulse streams to perform the network

calculations. The idea of using pulse streams to communkate information

berneen nemns is rno-ted from biologicai models. although biological pulse

streams are much more complex thRn the simple pulse representation consid-

ered in t h thesis. The main motfvattion here of applying pulse-code arïthmetic

to n e d networks is the ability to implement high density arïthmetic operations

using digital drcuitry. Specifically, pulse-code representations are able to use

simple âigital hardware to perfonn addition and multipïication. which are the

two most important operations in a neural network Also. pulsecode implemen-

tations offer the advanttages of both analog and digital computation. Like analog,

pulse representation requires O- one Une to carry the values, and the size of

the hardware [digital gates) needed to perform adthmetic computations is com-

parable to analog hardware. Like digital. the design methods and implementa-

tions are well established.

In this chapter the fundamentals of pulse-code arithmetic are presented and

this discussion Ieads to application of pulse-code arithmetic for neural net-

works.

2.1 Fundamentals of Pulse-Code Arithmetic

The fùndamental idea of pulse-code arithmetic is to use probabilities to

carry information (7. 81. Here the probability p is de5ed experimentally by con-

sidering the fkequency of the occurrence of an event (pulse in a tfme dot). A

smaU number of time slots results in an erroneous assessrnent of the probability

and the number which it represents. in the iimiting case of an inhite number of

time slots: if there are n pulses in N slots for a @en thne. and if n/N tends

towards a iimit as N + - we set

Figure 2.1 shows a synchronous randorn pulse SeQuence. At the top of the Fig-

ure 2.1.3 pulses are in 10 time slots, leading to the condusion that the number

transmitted is 0.3. At the bottom, 3 puises are arrangeci in différent time slots.

leacling again to 0.3. The order of the pulses in the time slots does not affect the

outcome of the representation. The probability can be transformecl into some

physical quantity by an appropriate mapping. This thesis only considers a ünear

mapping. although it shouid be noted that nonlinear mappings exist which per-

mit computations with numbers in an infinite range with logarithmic error char-

CHAPTER 2 - PulsbCoda Neural Nstnrodcs

actesistics [8]. There are two Iands of ïinear mapping: unipolar mapping and

maPPing,

3 in 10 (Average) + 0.3

DSerent arrangement of3 in 10 (aIso -> 0.3)

Figure 2.1 : Random pulse sequence

Unipolar linear mapping is used as the implementation method in this thesis. In

-polar mapping, the values are encbded bebveen O and 1. An example of uni-

polar pulse represemtation is shown in Figure 2.2. The value of the unipolar

pulse stream is represented by number of pulses. n, being ON withia the time

intemai divideci by Iength of the tfme interval. N, or

A unipolar pulse stream with N bits can represent N+ 1 u n i ~ u e values. For exam-

ple. a 10-bit puise stream can represent 11 values from 0-0 to 1.0. in increments

of O. 1. The resolution of value depends on the length of the pulse stream. The

longer the puise stream. the higher the resolution that can be achieved.

10- bit Pulse Stresms Unipolar Pulse Vaiue

Figure 2.2: Unipolar pulse Stream representation.

2.1.1 Pulse Stream Addltfon

The addition of two unipolar pulse can be pesformd by using an OR gate.

Howwer. OR gate addition does not perform exact addition with puise streams

because of limitations imposed by the representation of the pulse streams, fur-

thermore it camt handle a sum greater than 1. The output of the OR gate is

aven by

Thus for A<<1 and B e ~ 1 the AB tum is small and the output of the OR gate is

appmrdmatw A + B. For large A and B. the output of the OR gate saturates to 1.

This result of a saturathg nonlinearity will be useful to the impllementation of

neural networks. The output for an OR gate with n inputs is @en by

Output = l - r I ( 1 -in)

The n-input OR gate can be easily imp1ernented in hardware by using wfred-OR

logic. Figure 2.3 shows the output probabil@ of the OR gate addition. As the

number of the inputs increases, the output of the OR gate addition saturates for

a greater range of inputs. Also. as the value of average input increases. the out-

put of the OR gate addition saturates. Therefore. it is desirable to keep the fàn-in

of the inputs as wd as the value of the input small.

r i

Figure 23: OR gate addition.

2.1 1 Pulse Stream Multiplication

To multiply *O unipolar puise streams. an AND gate may be used if two

puise sequence are statistfcally uncorrelated 181. Since the unipolar value is

always less thsin or eQual to one. the product of two nurnbers is guaranteed to be

at most one and the result will not saturate as it did with OR-gate addition,

Asnm&qg the pulse Seqllences A and B are statisticalïy uncorrelateci, the output

SeQuence of an AND gate multiplication is given by

A n B = A B (2.4)

Figure 2.4 shows an example of the OR-gate addition and AND-gate rnuitiplica-

tioa

-Dm u -D- IU.

Figure 2.4: Example of the OR-gate addition and AND-gate multiplication.

2.1 -3 Pulse Stream Genemtion

An essentfal component of pulse-code arithmetic is the generation of the

pulse streams for use in the arithmetic operaions. As mentioned earlier, the

information is carrieci by the probabiiity of occurrence of a ON logic Ievel within

a time d o t Each logic level is generated h m a random variable, and the statis-

ticaIl.y independent resuits form a pulse sequence whose average pulse rate is

determined by the variable to be represented. It shouid be noted that the validity

of the pulse streams arithmetic relies heavily on the assumeci property of statis-

ticai independence between operating variables, Hence, of vital importance is

generators for the provision of independent tiniformly dismbuted random num-

bers*

In general, a random puise stream is generated with a unrform random number

generator and a digital comparator. Figure 2.5 shows a block diagram of a rate

multiplier to generate weighted puise streams. The following procedure can be

used to produce a raadom pulse stream with probabiiity P(ON)=W

Generate a random number R such that O S R 9 1 .

if W>R output a 1 else output a O.

in digital hatdware R and W are usually represented as binary integers. If the

maximum possible weight value is M. and the value stored in the weight register

is W. then the probability of a pulse should be NONI = W/M.

Digital Comparator Mse Stream Output P(0N) = weight

Weight R e g i s a

Figure 2.5: Block diagram of a pulse stream generation

A common technique to genenite a pseudorandom nimiber in digitai hardware is using a

hear feedback shift register (LFSR)[9]. Another method to produce a digital random

number is to employ a partïcular configuration of a one-dimensional Cellular Automata

(CA) atray[lO]. Hortensius[ll] has shown that certain arrangements of CAS pos-

sess maximai length SeQuences with superior random nuxnber properties corn-

pared to the LFSR A CA is a set of registers whose next state is governed by nearest

neighbour connections. A CA can yield a maximal-length binary sequence b m each site

(Le. 2n - L ). Wre the maximal-length LFSR by combining d e s 90

ai ('+ 1) = ai- ('1 ai+ (0

CHAPER 2 - Pulse-Cde Neural Neîworks

and rules 150.

a i ( t+ 1) = ai-l ( t ) @ a i ( t ) ( f )

where ai ( t ) is the value of the register at position i at thne t

The ordering of the d e s for construction of a mardmal-Iength binary sequence

is irregular. with complexity simikir to that involveci in determinhg the polyno-

miaï for a maximal-length LFSR Table 2.1 gïves a sample of possible co11struc-

tions for producing CAS with maximai cycle length up to length 15. Here. '1"

refers to CA rule 150 and 'Ow r e f i to CA rule 90. Hence. a 1ength-5 maximal -

length CA wodd be constructeci by using d e s 90 and 150 in the following

0rde.r 150, 150,90,90 150.

Table 2.1 : Construction of CA with maximal cycle length

Length n Construction Cycle Length

-

Figure 2.6: Rate multiplier schematic

A rate multiplier can be used to compare a binary set of wedghts and a set of rau-

dom bit streams with P(ON)=0.5, and produces a weighted bit stream with

N- 1 P(ON)=w/2 . The rate multiplier as shown in Figure 2.6 originated out of

research in VLSI pseudo-dom test pattern generatfon [12].

2.2 Pulse-Code Neural Network

The idea of using pulse streams has been tried by Tomberg and his CO-

workers, who pubiished two papers on neural networks using pulse-density

modulation [13. 141. The implementation was a Hopfield type fUlly connecteci

neural network architecture based on bipolar pulse density modulation. Tomlin-

son (15) and Dickson (161 aiso studied in-situ learning neural network using

unipolar pulse streams. The work presented in this section is based on the

research fkom Tomllnson and Dickson. extendhg the design methodology for

these irnplementations.

2.2.1 Pulse-code Feedforward Neural Network

The basic cornputalional operations requLred in feedforward neural networks are

multiplication, summation. and a non-linearity fiinction. Each neuron cornputes

a weighted sum of its inputs h m other neurons* and passes this summation

through a noniinear function to produce an output This output forms the input

to the foliowing layer. TomlinSan [15] has proposeci a new neural activation fùnc-

tïon whxe the summation and the nonrinear activation are performed sirnuita-

neously us- the OR logic. As mentioned earlier the OR gate addition saturates

to 1 with either an increase in the number of inputs or an increase in the value

of those inputs. The saturating effect of the OR gate addition requires no extra

hardware to implement a nonlinear activation function. Also, the logical OR can

be easiïy implementeci in hardware using wired-OR logic, The multiplication of

the weight (wu) and the input (oi) are performed with a simple AND gate as pre-

viously describeci, assuming that these pulse Sequenees are statisticaliy ucor-

related.

Let ni be the probabîlity of a puise occurrence in the output SeQuence of an n-

input OR gate. The inputs of an OR gate are the product of w, and oi produced

CHAPTER 2 - PulssCods Neural Nctworks

from the AND gates. This Is represented mathematicdy by the following equa-

tion:

Since the unipolar nature of the pulse stream representation does not support

negative values, each synaptic weight is separateci into two distinct nets: the

e x c h t o r y and the inhibüorg nets. Therefore, there are two dedicated wired-OR

iines per neuron ANDed togethex to form the activation fiuiction and the net

inputs variables are defineci as

Each neuron f combines the excitatory net input and the inhibitory net input

ni to determine the neumn output oj. Since thue Is no means to perform sub-

traction in pulse-code arithmetic. the net output of the neuron is not simplly

4 - oj = ni - nj . Moreover negathre and positive! nets would requin the accommoda-

tîon of negative neuron outputs. If the excitatory net input n,? and inhibitory net

input are statistïcally uncolTelateci. the probability of output pulse occur-

rence oj is

Rapîd-Pmtotyping of Artifltlrl Neural Netumrlo 19

CHAPTER 2 - Pulm-Code Naumt Networks

While the mathematics implies that a weight can have a positive [w;) cornpo-

nents and a negative [ w i ) components, it is not necessary to accommodate both

simuîtaneousIy. T'herefore. it ody requires one regîster to store the weight value

and one bit to indicate the sign of the weight. The hardware required for this

computation is shown in Figure 2.7.

Wbgbt Siga Bit (positive = 1)

Figure 2.7: Negafive and positive weight.

29.2 Training PuIse=Code Neural Netwoiks

Because a pulse-code neural network uses a non-traditional activation

fiinction. it is instructive to consider a modifieci learning algorithm where this

activation fiinction is acorporated into a popular leaniing algorithm. Equations

2.5 and 2.8 are continuous and difllerentiable. inüicating that the backpropaga-

tion learning algorithm can be used for training. Backpropagation is an iterative

technique that performs a gradient descent, typicaily over a sum squared error

measure:

where t j is the desired output and oi is the actual output of the neuronf. The

weights should be modlfied dong the negatkve gradient of this error with respect

to each weighk

The goal of the backpropagation learning algorithm is to reduce the total

error by adjusting the weights. Since the output of the neurons are computed

fiom the excitatory nets and inhibitory nets. the derivative must be considered

separatdy for positive and negative weights. Using the chain rule. the positive

and negative equations go- the change of weights is as follows

Let us define.

The result of the positive and negative equations become:

Equation 2.9 shows that the error at the output neurons is sirnply the diner-

ence bemeen the training data and the netftrark output:

For the hidden layers. the aror is propagated back through the network.

Each of the K output neurons Is connecteci to hidden neumn j. and wtll contrib-

ute to th& error. This error has two components. h m the excitatory and inhibi-

tory net inputs to each output neuron.

Therefore, the result of the error for the hidden neuron becomes

Summary

This chaptes has presenteà a method of arithmetic using pulse streamç.

As discussed pulse streams aiiow computation using only simple gates. in addi-

tion this chapte-r has discusseâ the suitabillty of pulse stream implementations

for neural networks. The theoreticai -sis showed that the backpropagation

learning algorithm can be used for training this networlr, using the OR-gate acti-

vation function as a continuous and differentiab1e non-iinear fiuiction. It may be

possible to implement neural netwarks of high density due to the use of simple

digital gates for pedorming arithmetic.

Chapter 3

Simulation of Pulse- Code Neural Networks

This chapter examines the simulation of pulse stream neural networks.

Four examples WU be discussed in this section. The fht three examples are 'toy

problems" which are often used for tes- and benchmarking neural networks-

Typtcaily the training set contains ali possible input patterns so there is no

question of generalization. The iast example is a more real-world classification

problem in which a network is trained to recognize the digits of a cheque book.

3.1 Simulation Environment

The nettirrorks cari be simulateci on two levels: probability and pulse

stream. The probability l d models the network activation as probabilities: the

pulse level mdeis aU activations as pulses and direcüy emulates a hardware

impîementation of this network. In our simulator the probability mode1 is chosen

for the training of the networks. since it produces more accurate results without

worying about the hardware limitation of the network, A backpropagation algo-

-- -

CHAQTER 3 - Simulation of Pul- Neural Networks

rithm is employed for training as describeci in the previous section.

l i

Figure 3.1 : The user interface of the pulsecode neural network simulator

The training of the pulse-code neural network has been pedormed off-Une using

C code interfaceci with the m o n neural network simuïator library [17]. Xerion's

iibraries contain routines for constmcting. disphying and trainhg nemorks.

The software also contaias routines to graphicaiiy display neuron outputs and

weights using Hinton dhgrams. The pulstcode neural network simulator mit-

ilapid-Pmtotyping of Artiftcial Neural Networlirr 25

ten using the Xerion iibraries has access to a CO-d iine interface with built

in commands for: creatïng and training networks, examining and modlfling data

structures. as weiî as mfsceiianeous utilities- Once the traidng is completed. the

simulator can generate a network description fiie which Lncludes the network

topdogy and the weights of the synapses for hardware impiementation,

3.2 XOR Problem

The XOR problem [6] is a fi.equently appJied test of neural networbs since

it cannot be SOM by a single layer network because it is not a linearly separa-

ble problem- A network consfsting of two input neurons, two hidden neurons,

and one output neuron was applied to the pulse-code neural network

Figure 3.2: Training evolutiDon of the XOR problem.

Figure 3.2 shows the learning curve of the XOR problem trained with the regular

CHAPTER 3 - Simulation of Pulse-Coâe Neural Nctworîcs

and pulse-code backpropagation algorithmw. It has an interesthg resutt that the

pulse-code network learns the XOR problem almost 10 times faster than the

normal backpropagation network The normal backpropagaffon network uses

the sigrnoid function as the nodinar transf- fitnction. while the pulse-code

network uses the wired-OR gate addition saturation to obtain the nonlbearity.

in addition. the initiai weights of the puise-code network must be very srnail to

prevent immature saturation. in general the wired-OR gate addition saturation

is not uniform throughout the network. depeadbg on the number of inputs to

the neuron (see Figure 2-31.

The karning equations for pulse-code netmorks contain dtvisioa Aithough divi-

sion can be performed using pulse-code arithmetic [81, it is undesirable for a

number of reasons. For esample there are problems associatecl with division by

zero, in addition to being a time consuming operation and requiring more hard-

ware. Dickson[l81 has proposed to omit division for training pulse-code network

in order to reduce hardware overhead for imp1ementing on chip learning capabil-

ity-

To determine whether dtvision is necessmy the nebworks were trained with the

division operattïon ornitteci. The resuits of the simulation are shown Figure 3.3.

Aithough the XOR problem trained in more cycles without the division. in con-

trast again to regular baclcpmpagation ft stiîl r c q M ody haif the time. The

resuîts show that eliminating division does not impair the abiïity of the network

CHAPTER 3 - Simulation of Pulse-Code Neural Natworks

to minirnize the error.

r t

(a) T ' g fiw XOR problcm wïth dciwaminamr (b) TraiDing for XOR pmblan arithout denomiriator

Figure 3.3: The impact on dision on network training.

3.3 Parity Problem

The parity problem is weil knmm to researchers doing performance eval-

uation of neural ne-rks. This problem is essenthly a generallzation of the

XOR problem to N inputs 151. Here the nehwork is presented with binary inputs.

0's and 1's. and the s&@e output neufon must output a high value if the input

pattern has odd parity (an odd number of 1's). and a low value if the inputs have

even parie. The parity problem is one of the most difficult problems for an artifi-

ciaï neural network to solve, because the netftrork must look at ail the input sig-

nais in order to detemnine whether the pattern has even or odd parîty. This is

untypid of most rcal-worid ctassification problems which usually have mu&

more rtgularity and allow generaiïzation within classes of simiiar input patterns.

Four inputs were used in this study. I t is k n m that a network with four hidden

neufons[l9] can sohre this problem using the standard backpropagation algo-

rithm. but the puise-code network M e d to do so because of the timiation of the

pulse stream representation. By tryfng different network topoIogies. the firial

network topology for the four bit parie problem consisteci of two hidden layers of

12 and 8 uni& respectkvely. Figue 3.4 shows the error during training.

Figure 3.4: Training for the four bit parity problem

This problem has shown a iimitation of the pulse-code representation. specifi-

cally the limited range of the synaptic weights, The other potentfal problem is

usîng the OR gate for addition- As the number of inputs to a ne- (the OR

gate) fncreases. the probabiüty of a 1 output for a given set of inputs rises. This

tesuit suggests that the weights in a neural network must be very s d to pre-

- - --

CHAPTER 3 - Simulation of Pulse-Code Neural Neîworks

vent the neuron 6rom constantly saturating. The accornrnodation of the iimited

synaptic connections and premature neuron saturation is cruciai for the suc-

cess of puise-code networks. While the cornplaty of the network is greater for

these ne-orks, the harâware compleAity is still significantiy less than the con-

ventional digital nehworks. Since the hardware requirement is simiificantly less.

the cast of adding more computing elements is not severe,

The general encoding problem invohres fincIiTlp an efficient set of hidden

neuron patterns to encode a kuge number of input/output patterns. The

number of hidden neurons is intentionally made small to force an efficient

encoding. The speciûc problem usu* considered involves auto-association,

using identical umaqy input and output patterns. A three-layer network consists

of N inputs, N outputs and M hidden neurons. with M < N. This is often referred

to as an N-M-N encoder. There are exactly N members of the training set, each

ha- one input and the corresponding target o a and the rest off-

An 8 input encoder was useâ in this stuciy- Conventional backpropagation could

sobe this problem with an 8-3-8 encoder netarork [19]. The activation pattern of

the hidden neurons glve the binary representation of the training pattem

numbe!r. This can be achieveâ with the connection strengths pattemd after the

binaxy numbers. Clearly pulse cade networks wiîl fh.il as the weight connection is

- - -- - - --

CHAPTER 3 - Simulstion of Pul- Neural Neworlg

bounded by [- 1. + 11. in order to sohre this problem. a more complex puise code

network is needed. The thai network uses 6 hidden neurons, which is an 8-6-8

encoder. Figure 3.5 shows the learning of the 8-6-8 encoder. Once again. the

limited range of the synaptic weights is the roadblock of the Iearning capabïiity

in pulse code ne-ork

O M 10 60 80 100 120 140 180 180 200 Number of Epoches

Figure 3.5: Training for 8-6-8 encoder

3.5 Cheque Character Recognition

In the proceeding three examples the training set normally includes aU

possible input patterns, so no gmeralization issues arises. Cheque character

recognition is a more real-Me problem in which noisy patterns can be used for

testing the generallzation abiîity. Most cheques have some strange characters

- - -

CHAPTER 3 - Simulation of Pulse-coâe Neural Nstworlrs

indicating the account numbef. as shown in Figure 3-6. These characters are

mapped to a 5x5 maMx as the input to this nebvork with 10 output neurons

corresponâing to each of the possible chmcter (0.1. .... 9). One hidden layer of

six neurons is used to train the pulse-code network. The trainhg emr is shown

in Figure 3.7 and the input activation is show in Figure 3.8.

Figure 3.6: Sample cheque

CHAPTER 3 - Simulation of Pulse=Code Neural Natworks

~p

Figure 3.7: Training for the cheque character recognition

'O I g o - . 330 - . 6

B ,-

O

Figure 3.8: Input activation for the cheque character recognition problem

1 t r I 1 \ 1 O 2 4 6 8 1 O 12

Num ber of Epoches (xi d)

This chapter presented a number of simulation contrasting pulse-code

neural network implementations to more traditional networks. 'Rvo potentiai

problems arose using puise-code representations. The first concerns the use of

the OR gate for addition. As the nonlinearity of the OR gate depends on the

number of inputs. some of the neurons could immeturely saturate. This problem

can be SOM using multi-ïayered architecture and lirniting the number of

inputs to each neurone Rumelhart et. al [19J stated that *A simple method for

overcoming the fm-out limitation is simply to use multiple layers of units." The

second potentiai pmblem with pulse-code neural networks is the iimited range

of the weight connections. Increasing network topology or complexity could

accommodate these limitations,

Chapter 4

lmplementation of Pulse- Code Neural Networks in Xilinx FPGAs

This chapter describes the FPGA implementation of the pulse stream neu-

ral nelmorks. It starts off by describing the overall hardware architecture of

these networks, followeà by the brief oventiew of Xillnx XC4000 series FPGAs.

The design process of the network implementactan is then describeci in detail.

FLri;illy, several examples of the networks implemented in XilfIix FPGAs are

examineci.

4.1 FPGAs lmplementation

T b section discusses the implementation of pulse stream neural net-

works in FPGAs.

4.1.1 Modular Design of Pulse Stream Neural Networlcs

There are two main components required to construct a pulse stream

neural network a random number generator and a neuron/synapse element.

Figure 4.1 shows the block diagram of the network structure. Each neumn ~yn-

apse layer has one CA based randorn number generator. T'bis random number is

used to generate a weighted pulse stream of each synapse. The output of neu-

ron/synapse elements are passed to the inputs of next Iayer.

Figure 4.1: The toplevel of pulse stream neural netwoiks

- ---

CHAPTER 4 - Impiementaion of Pulse-Co& Neural Networks in Xlinx FPGAs

stream. This weighted pulse stream is ANDed with an input pulse stream fkom a

previous layer to produce a synaptic multiplication. The product fs transmitted

to an excitatory net-input Une or an inhibitory net-input Une. if the sign bit of

the weight is '0'. the product pulse stream is transmitted to an excitatory net

input Une. Otherwise, it is transniltted to an inhibitory net input line.

The neuron preforms addition of aU the net input signals h m synapses through

an OR gate. The outcome of these excitatory and inhibitory net input signals are

ANDed to form the activation signal. This signal passes through a re-randoniizer

circuit to generate the output puise stream.

CA Random Number

Figu- 4.3: Block diagram of the re-randomizer circuit-

- - - - -

CHAPTER 4 - lmpîementaüon of PulssCode Neural Networb in Xilinx FPGU

As mentioned in the earïier chapter the pulse stream arfthmetic relies heavily on

the assumed property of statistical independence between puise streams. The

re-randomizer re-orders the neuron output in order to prevent the correlation

between the output pulse streams fkom the earlier layers.

The re-randomizer (181 consfsts of an up-down couder and a rate multiplier. as

shown in Figure 4.3. The up-dom counter controis the density of the output

stream. If the output is high when the input is low, the counter is decremented.

If the output is low and the input is high. then the counter is incremented. tf the

input and the output are equfvalent then there is no change. The re-randomizer

uses the random number fi-om the neuron/synapse shifted one bit to the left.

The rate multiplier compares the counter vafue and the random numüer to form

the re-randomized output pulse stream.

4.1 .4 Weight Resoluüon

The weights are stored in the form of a . n-bit hctionai sign-magnitude

number. Each pulse represents the magnitude of 1/ 2" - . A 9 bit weight resolu-

tion is chosen, which has 8 bits magnitude and 1 sign bit Thus there are 2% -2

possible positive values. 28 - 2 negative values. and zero. Increased resolution

requires more hardware due to a larger random number generator* neuron re-

randomizer and rate muitipUer. EUm et ai.[20] has shown that a 9 bit weight pro-

vides an acceptable result for neural network classification and generalizatfon.

CHAmER 4 - Implemmtatbm of PulssCode Neural Networks in Xilinx FPGA8

4.2 Overview of Xilinx Field Programmable Gate Arrays

In 1985, a new technology for implementing digitai logic was introduced, Field

Programmable Gate Arrays -As). These devices could be viewed as a cross between

Ma&-Programmable Gate h a y s (MPGAs) and Programmable Logic Devices (PLDs).

-As are capable of implementing significantly more logic than PLDs, because they can

implement multi-level of logic, while most PLDs are optimiir:ed for two-level logic, W e

they do not have the capacity of MPGAs, they &O do not have to be custom fabricated,

greatly lowering the costs for Iow-volume parts and avoiding long fabrication delays. One

of the best know FPGAs is the X k Logic Cell Arrays(LCAs)[2]. In this section their

third generation FPGA, the Xilinx 4000 series will be discussed-

Xillnx FPGAs consist of a . array of uncornmitteci lagic elements that can be

interconnecteci in a g e n d way like MPGAs. I t uses sfatic RAM (SRAM) cells as

the programmable element so that it can be re-programmeci as many times as

the designer wishes. Figure 4.4 shows a typical architecture of a XiIinx FPGA It

is a symmeetrical array architecture, consisting of a two-dimensional array of

Con$gwabZe Logk Blocks (CLE3s) that can be connected by prcigmmmable inter-

connectfon resources. The ïnterconnect comprises segments of -, where the

segments may be of various lengths. Present in the interco~ect are programma-

ble switches that serrre to connect the CLBs to the wire segments. or one w h

segment to aneth= Ingic circuits are implemented in the FEGA by partitionhg

the logic into individual CLBs and then interconnecting the blocks as required

CHAPTER 4 - Implamcntaüon of PulseCode Neural Networks in Xilinx FPGAs

via the switches. The I/O B k k (IOBs) surround the boundary of the FPGAs.

providinp the interface between the packages pins and intemal signal

Figure 4.4: Xilinx FPGAs architecture.

A Xillnx 4000 series CLB. as shown in Figure 4.5. is made up of three b k u p

Tabks [LUTs), Iwo programmable flip-flops, and muitipIe programmable multi-

plexers. The LUTs aUow arbitrary combiriationai firnctions of their inputs to be

created. Thus, the structure can perform any fimetion of five inputs (using all

three LUTs. witü the F & G inputs identical). a m tWO functions of four inputs

(the txwo 4-input LUTs used independentiy). or some fûnctions of up to nine

inputs (using al1 three LUTs. with F & G inputs different]. SRAM controîled mul-

CHAPTER 4 - Impkmentaian of PukaNode Neural Natworks in Xilinx FPGAs

tiplexers then c m route these signais out the X and Y outputs. as well as to the

two flip-flops. The inputs at top (Cl-C4) provide the third input to the 3-input

LUT, enable and set or reset signais to the fiip-flops, and a direct co~ection to

the fiip-flop inputs. This structure yields a very powerfùi rnethod of implement-

ïng arbitrary, complex digital logic. Note that there are sweral additional fea-

tures of the Xilinx FPGA not shown in these figure, including support for

embedded mernories and carxy chains.

Figure 4.5: XC4ûûû CLB

The CLBs are surrounded by horizontal and vertical routing channels that per-

mit arbitrary point-to-point communication. AU internai connections are com-

posed of metaî segments with programmable switching points to implement the

desinxi mutinp. n e r e are three main types of inte~connect. distfnguished by the

relative length of th& segments: single-length Unes. double-length Unes. and

- --

CHAP'ER 4 - Implementatian of Pu- Neural N e t ~ & in Xilinx FPGAs

longhes. Single-Iength Unes travel the height of a singIe CLB. where they then

enter a switch mat* The switch matrix allows this Signal to travel out vertically

and/or horiu,ntally h m the swftch martk Thus, multiple single-length Unes

can be cascadeci together to txavel longer distances, Double-length bes are sim-

ilar, except that they travel the height of hm CLBs before enterjng a switch

matrix. thus double-Iength Ifnes are use- for longer-distance routing, travers-

ing two CLB heights without the extra delay and the wasted confimiration sites

of an intermediate switch matrix Fi.nally. Ionglines are lines that go haif the chip

height, and do not enter the switch matrix. Ln this way. very long-distance

routes can be accommodated effidentiy. With this rich sea of routing resources,

the Xillnx 4000 series is able to handle fair& arbitrary routing demands, though

mappings that emphasize local communication will still be handled more effi-

cientiy.

4.3 Design Flow of Pulse Code Neural Network Hardware

Aithough the Xerion neurai network simuiator is valuable tool for simu-

lating pulse-code networks. another goal of th& thesis is to search for a design

flow generatïng FPGA harchNare h m a high level network description. Idealiy

the design flow progresses h m the Xerion neural network simulation to the

generation of a Xilinx bit file for pro- the FPGA device. Xerion is used to

simulate and train the pulse-code neural nebwork iterate the design, and then

generate a nebwork description file including the network topology and final

CHAPTER 4 - lmpkmarrfaion of Puhe-Code Neural Networb in Xilinx FPGAs

XeRoa N e u d Network Simuiator

Autdogic

Synthesis and Optimitation

I

Figure 4.6: Pulsecode neural network design process

xerioa

NNtrainmg and simuiaaon w VHDLGenmLtbm*

--

CHAPTER 4 - Impkmarrtitiori of Pu- Neural NaWodm in Xilinx FPGAs

weights. Using thfs description, a custom 'C' program converts it to a VHDL

description. Mentor Graphics Top-Down Tools [21] are used for VHDL' compi-

lation, syntax verification, synthesis. opümïzation, and simulation NeoCad

FPGA Fouciry tools[22] are used to map the design to a physicai FPGA device

and to create the bit file for p r o g r a m . the Xillnx chip. Static timing anaiysis

of the placed and routed design is also done withïn the NeoCad tools. Finally, the

design information is back-annotateci to a Mentor Graphics database and func-

tioxdly tested against the top-14 VHDL testbench, The complete design flow is

shown in Figure 4.6

4.3.1 Database Structure

As the design goes thmugh various tools. it is vexy important to organize

the database properiy. Many procedural problems can be avoided by planning

the directory structure. Figure 4.7 iilustrates one example of how to organize

design database for the design,

1. VHDL - VHSIC Hardware Description Language is a language for designing integrated circuits.

CHAFTER 4 - Implemantation of PuhmCade Neural Networks in Xllinx FPGAs

The neural network design directoxy Ls divideci into five sub-directories for

Xerion simulation, VHDL source, synthesis and optimization. gate-level sche-

matic and the physicai FPGA iayout,

4.3.2 Xerion Neural Networû Simulator

Xerion ïs used to train and simulate the pulse code neural networlzs, as

mentïoned in the previous chapter. The input of the sirnulator is a text file with

the description of the network topology and example training data, The networks

train with the backpropagation algorithm. Once the training is completed,

Xerion generates a network spedfwation for hardware implementation. This

specification includes the topology and the weights of synapses.

4.3.3 VHDL Code

The netftrork spedcation generated from Xerion is converted into VHDL

code for hardware implementation. VHDL is a hqpage for designing Integrated

Circuits. which can describe the circuits at the behaviour and/or structure level.

In order to ensure that the VHDL code is syathesizable. the designs must be

described at Register Tramfer ïanguage (RTL] lwd. in addition. there are cer-

tain code seles to use when targettng Xillnx FPGAs. Appendix A provides design

hints for writing VHDL for Xillnx FPGA designs. A custom 'C" program is used to

generate the synthesizable VHDL code for XilInx FPGAs h m the network speci-

fication.

The VHDL description of the network is hie.rarchicalEy 0rganbm.i. The top-level

CHAPTER 4 - Implemantation of Pu- Neuml Networlœ in Xilinx FPG&

circuit represents the connections behween the neuron/synapse unîts and CA

randorn number generators. Kt is described in structurai VHDL, The newon/

synapse units and C h are described in RTL descriptions.

Each VHDL component must have an entiiy which defines the I/O of the model-

Each VHDL component in the design also has several architectures. For neu-

ron/synapse U t s and CAs, the RTL description of the circuit is fn an architec-

ture d e d RTL The top-Ievel circuit uses a structural VHDL mode1 which

should be in an architecture d e d sbuct,

0th- VHDL architectures are also necessary. In order to fàcilitate the aeation

of a hierarchid schematic for the top-1evd circuit, a dummy architecture has to

be created for each low-lem4 circuit (neuron/synapse elements and C h ) . These

dummy architectures have nothing between the BEGIN and END statements in

the VHDL. They are just place ho1ders for the schematic generation process. A

schem architecture is required of the top-level circuit. The architecture is the

same as the struct mhitecture except that it calls only the dummy VHDL archi-

tecture for the circuits beneath it.

Mentor Graphies' system-1076 compiler performs the syntax Che- and data-

base generatfon of the VHDL design. Once syntactidy correct. the compiler

creates a Mentor Eddm Anfrrhhce h m the VHDL code which can be simulated in

Quicksim for simulation and read into Autologic for syathesis.

CHAPTER 4 - Implementatim of Pu- Neural Netwodm in Xilinx FPGAs

4.3.4 Synthwis and Optimization

After compiling the VHDL files. the Mentor Eddm database is read into

Autologic and synthesized to the Xillnx XC4000 FPGAs. The NeoCad Xillnx

XC4000 ïibrary is useâ and the target environment variables are set to commer-

cial deratiog fàctors.

The neuron/synapse elements and CA circuits in the low-lwel hierarchy are

written in R n level VHDL. This code is synthesized directly to XC4000 gates.

These circuits are synthesized separateiy. AU hierarchy implied in the VHDL

code at this level is fiattened to improve the area optim&îtion. The optimïzation

recipe in Autologic used is AREA(LOW) with an AREA REPORT. Since timing

optb&&ion is not available this is aii that is required at this level.

Symbo1s must be created for each low-level circuit so that they can be referenced

by the top-1wel hierarchical schematic. These symbols are automaticaily gener-

ated when the VHDL entities are compiled. To Save these symbois, they must be

opened within Design Architect and saved.

The top-level for this design is only the co~lxlection between the neuron/synapse

and CAS, which is deîmed fa a structural VNDL netllst in order to generate a

hierachical schernatic for the top-levei circuit in Autologici the following multiple

step process is used:

- ~p --

Rapid-Prototyping of ArtH-id Neural Networîœ 4 8

CHAPTER 4 - lmplementdon of Pulse-Code Neural Networûs in Xilinx FPGAs

Ail neurons/synapse uni- and CAS circuit must have a durnmy architectures.

The low-levei circuit must have been previousiy synthesized to gates and a symbol created to represent the circuits.

The top-level circuit should have a schem architecture which calls the dummy architecture of the low-level circuits. This was prevfously mentioned.

The schem architecture of the top-levd circuit is synthesizeû and optimized into a schematfc.

On the resuitinp schematic, the dummy components are replaced with symbols for the real circuits that h a . been previousiy synthesized.

Back in Autologic. the resulting schematic with real componets is re-optimized. The 1 / 0 ports and buf5ers are added to the final schematic.

4.3.5 Design Varification

TEST DRIVER

Be haviorai VHDL

TEST MOrnOR

Be havio ml VHDL

Ftgure 4.8: VHDL testbenches

Qt&k&m îs used to simulate the operation of the circuits in both VHDL and

XC4000 gate representations. To iùnctionally verify the design, VHDL test-

benches are creatd for the top-level circuits. After synthesis. the same test-

CHAPTER 4 - lmplementstion of Puise-Coôe Neural Networks in Xilinx FPGAs

benches are used to simuiate the gate-level circuit. The VHDL and a gate opera-

tion are compareci to make sure they match.

The VHDL test-benches are only a piece of behavioural VHDL to drhre the input.

If the design is compIex. then a piece of VHDL can also be written to monitor the

outputs and report if an error is seea. The achranttage to the VHDL driver/moni-

tor test-bench is that it can be used to test the gate-level models as well as the

VHDL. Figure 4.8 shows the connections of a VHDL testbench and how it can be

used to drive and monitor the operation of a circuit block. For pulse stream neu-

ral network design, the test-driver provides the input example to the network,

and the test-monitor uses a number of up counters to monitor the pulse density

of the output neurons.

4.3.6 Neocad FPGA Foundry

The Neocad foundry tools are used to map the Xfllnx XC4000 gates into a

physical array. The NeoCad tools accept data fkom Mentor Graphics in the form

of EDE 2 O O netlist. Mapsh. mapping tool. maps the EDIF file into the specific

Xilinx part and package. performs the design rule checking. and generates a

NeoCad database file for place and route. Parsh. place and route tool, performs

the place the and route of the FPGA. Tricesh is then used to d y s i z e the timing

of the design. The layout related timing information is back-annotation to Men-

tor Graphics design verifkation tools. Finally. Neocad can -te a bitstream file

which is useâ to physicaiîy program the Xillnx FFGA chip

CHAPTER 4 - Impîementaüon of PulmSoôe Neural Natworîm in Xilirix FPGAs

4.4 FPGAs Design Examples

Three neural networks example problems are implemented into Xillnx

FPGAs: the XOR the encoder and the cheque character recognition. The fmst

two examples are implementd onto Xilinx XC4010 PG -6 FPGAs. Appendix B

provides a qui& refkrence of different Xilinx 4000 series FFGAs that are avaiia-

ble. The XC4010 part has 4000 CLBs equivalent to 10.000 gates, Therefore. it is

able to accommodate a simiificantiy large desig~l in a single FPGA. The speed

grade of this part is 6 which meam there fs a 613s delay of each CLB. The last

problem is implemented on a Xillnx XC4013 FPGA. In this section. the simula-

tion of the XOR pmb1em will be discussed and the area and timing of al1 three

probkms wiU also be examineci.

4.4.7 XOR Problem

The XOR pmblem, as mentioned in the chapter 3. is implemented into

XilLnx XC4û10 FPGAs. The networks to solve this problem consists of two input

neurons. tnm hidden neurons and output neuron.

A top-level testbench was created to v& the VHDL mode1 and the gate level

representation, This testbench presented inputs to the nettktork and monitored

the output pulse density using and up-counter. Figure 4.9 shows the simulation

result of the network. The curve in the figure represents the value of the coun-

ter in the re-randomizer of the output neUron. The initial value of re-randomiz-

ers has a 'prechargingœ value, 1/2 of the maximum couter value. With each

-

CHAPTER 4 - Implamentatîon of Pol- Neural Networks in Xilinx FPGAs

presentation of an input vector the value of re-randomizer is reset to 'precharg-

hg" value. n i e simulation result has proved that the design is functioning as

expected,

Figure 4.9: XOR simulation results

The complete layout of the design is shown in Figure 4.10. The design required

69 CLBs and 5 IOBs. It has the interesting result that there are two clusters of

CLBs in the layout. The smaiï duster of the CLBs is the output neuron and the

larger one is the two hidden neurons. It only takes 23 CLBs per neuron in this

design.

Neocad's timing @sis tools. Trcesh, is useâ to perform the static oming anal-

ysis. Figure 4.1 1 shows the timing report of the XOR design. It details the maxî-

mum dehy path of the design. This report identifies both logic delays and route

CHAPTER 4 - Impkmmtatfon of PuiseCoch Neural Networks in Xilinx FPGAs - --

delays. The 'R' next to the delay entxy indicates a delay based on rising edge tlm-

ing of signals through a 106 or CLB. The maximum delay path of this design is

171.585ns. The report also shows a breakdown of the percentage of delay attrib-

uted to logic vs. routing delays. in this case 53.6Oh is logic delay and 46.4Oh is

routing delay. The total delay of the design is l?l.585ns logic and routing delay

plus 8.011s setup, which is 179.585ns. Therefore the maximum operating fke-

quency of the XOR des- is 5.56ûMHz.

Figure 4.10: The FPGA layout of XOR problem

RapWPrototyplng of A M k l a I Neural Naworim 53

CHAPTER 4 - Impiementaion of Puke-Code Neuml Networb in Xilinx FPGAs

Figure 4-11: The timing report of the XOR FPGA design

CHAPTER 4 - Impkmmtation of PulseCoâe Neural Networks in Xilinx FPGAs

4.4.8 Encoder Problem

r

Figure 4.12: The FPGA layout of 54-5 encoder

The other design example is an encoder. The network has five inputs. four hid-

den and &le output neurons. This network is s d e r than the one dfscussed in

chapter 3. but it is the same class of problem. The FPGA layout of the encoder

design is shown in Figure 4.12. There are 215 CLBs and 12 IOBs used. The total

Rapid-Prototyping of ArtMcla1 Neural Netwodcs

logic and routing deky is 249.346ns with 44.9% h m logic and 55.I0h from

routing. The routing deïay in this design has higher percentage than the XOR as

it has more synaptic connections ktween the layers. The maximum opera-

fiequency Ls 3.886- The average number of CLBs per neurons are 23.89. The

design has more synapses per neuron than the XOR but this average is stiJl very

dose to that of the XOR As aU the synapses are combinational logic the place

and mute is able to effidently pack them into the CLBs.

4.4.9 Cheque Character Recognition

This prob1em was âiscussed in chapter 3 and the network has 25 inputs.

6 hidden and 10 output neurons. This design is impllemented into a Xillnx

XC4013 FPGA It used 506 CLBs and 37 IOBs and the average number of CLBs

per newons is 31.63. The maximum dela. path of this design is 340-09 with

32.6% logic and 67.4% muting. The maxîmum operating hquency is

2.873MWz.

CHAPTER 4 - lmplementation of Pulse-Code Neural Networks in Xilinx FPGAs

Figure 4.13: The FPGA layout of cheque character recognition

Rapid-Pmtotyping of Artifkial Neural Networlcs SI

4.5 Summary

This chapter has presented a hardware implementation of muiti-layer

neural networks using pulse-code m e t i c * The design of the networks is hier-

archically organized so that they resuits in more optimïzed circuity during logic

synthesis and opümïzation. The networks are dfvided into neuron/synapse

units and random number generators. This chapter aiso discussed a re-rand-

omizer for the neuron output in order to prevent conrrelation between the neu-

ron puise streams,

A top-dm design flm for constructing these networks has b e e ~ discussed.

This flow progresses h m a high-levcl network description to the generation of a

Xilinx bit stream for progr,Immfr,g the Xillnx FPGA device. The use of a VHDL

testbench for design verification was aiso discussed, FoUawing this. three exam-

ple problems were implemented on Xillnx 4000 series FPGAs. The Lmplementa-

tion resuits showed that the network are exfremeïy compact and use only 23

CLBs per neuron/synapse unit for XOR problem.

Chapter 5

Conclusions and Future Work

This thesis has demonstrateci the implementation of puise-code neural

nehmrks in Xilinx FPGAs. The hambare reQuiremnts of these neworks was

shown to be rninimni; oniy simple digital gates were required to perform the

arithmetic. Also. the use of the backpmpagation learnlng algorithm for training.

as well the simulation of these networks was discussed- The simulation results

suggested that the weights in a neural network must be very srnail to prevent

the neuon fkom constantly saturathg and multi-layers networks shouid be

used in order to wercome the fan-in iimitation of the nemon.

The harâware architecture of these networks was described as well the top-down

design flow for implementing these networks in Xillnx FPGAs. ln addition. tcRo

design examp1es. the XOR and encoder. were implemented and examinecl. The

implementation results have shown that the average number of CLBs per neu-

ron/synapse unit was only 23 for XOR problem. nie increase in the number of

the synapse connections of the neuron did not signiûcany contribute to this

hardware cost As a result a si&nificantiy large network can be implement on a

single XIllnx FPGA For example. approximateiy 44 neuron/synapse units couid

be implemented on a Xfllnx XC4025 part. the largest Xillnx 4000 series part

which has 1024 CLBs on a single chip. With the aid of the c u i ~ e n t state of art

C A . tools. the design cycle took only days to complete as opposed to traditional

design methodology of weeks or months.

Continued work in this area shouid fnvestigate the use of time mdtiplexixg to

m e r increase the number of neuons per device. The idea is to have one single

physical layer of neurons and re-use them for different layers emuiating multiple

layers of neUronS.

The use of multiple FPGA environments for implementing these ne-orks shouid

also be investigated. in this case, very large s d e neural networks can imple-

mented for prototying. As weiï, other FPGAs device should be considered. One

potential candidate is the new Xillnx 8000 series which is a sea of gate

architecture. Since neural network structures are highly regular with ïittle glo-

bal wiring* the basic architecture is simiIar to the architecture of the XC8000

serfes FPGAs, therefore better utiiization of the FPGA can be achieved.

A sophisticated high levd interface that compiles a given neural architecture

directly to a single FPGA or multi-FPGA based hardware system should be dwd-

CHAP'ER 5 - Conclusions and Future Worlc

oped. Howwer, the logic synthesis toois could be a problem for such systems as

the cument logic synthesis tools do not work well when the design exceeds 3000

gates. F d e r work should be done on partitionhg the networks into srnafi

pieces for logic synthesis as wd as on the use the Mentor Graphies' design man-

agement software, WorWCpert. to automate the des- flow and design capture.

Appendix A

Targeting VHDL Design to Xilinx FPGAs

As the density and complerdty of Xillnx FPGA designs increase to 20,000

gates and beyonci, the traditional schematic capture design entry is often cum-

bersorne. The use of hardware description kmguqes (HDLs). such as VHDL and

Verilog HDL. can raise designer pducthrity. Hfgh-level languages combineci

with logic synthesis c a . prwide a consistent design methodology across a range

of technologies. By raishg the level of design abstraction. synthesis tools can

increase pducavity. uisuring m r - h e gate level reaiizations and fkeeing

designers for more creative tasks, However- the designer should not ease up on

harâwam imp1ementation consideration when synthesis tool aids are a m b l e .

The methods for des- ASICs do not aiways apply to des- with X i h x

FPGAs, ASICs have more gates and routing resources than Xilinx FPGAs. Since

ASICs have a large number of avaiîable resowces. the designer can easily create

inefficient code that resuits in a lage number of gates. When designing with Xil-

inx FPGAs, the designer m u t create efficient code.

APPENDIX A - Targsting WDL Oaaign to Xilinx F PGAs

The VHSIC Hardware Description Language (VHDL) is a language for designhg

integrated Circuits (ICs). which can descilx the designs at the behaviour and/or

structure level. VHDL des- can be behaviouraily simulatecl and tested to be

fiulctionaily correct before synthesis. However many VHDL constructs are not

supported by synthesis tools. In general. only a subset of VHDL constructs,

called Register T m f e r Leuel WïL) cons.truccts. are accepted by the synthesis

tools. in addition. systhesis tooîs intreprete the VHDL code differeniy when tar-

geting different technologies. The foliowing guidelines ensure VHDL code that

takes the best advantages of)(illnx's resources and pniduces the same function-

aïity after synthesis.

A.l Wait for XX ns Statement

Wait for XX ns statements specifies the number of nanoseconds that must

pass before a condition is executed. This statement does not synthesize to a

component- In designs that iaclude this statement.. the fiinctionalilty of the sim-

ulated design does not match the fùnctionallty of the synthesized design.

A.2 After XX ns Statement

After XX ns statement is usually uoed as a condition of a signal assign-

ment. This statement is usually ignored by the synthesis twl. An example of tbis

statement Is:

Q <=O after xx ns

APPENDIX A - Targeîhg VHDL Design to Xilinx FPGAs

A.3 Initial Values

Assigning signais and variables initiai values are ignored by most synthe-

sis tools. The functionality of the simuiated design may not match the fùnction-

ality of the synthesized design. For example, do not use initiaïization statements

such as the following:

variable SUM: TNTEGER:=O

A.4 Order and Group Arithmetic Functions

The ordering and grouping of arithmetic fiinctions influence design per-

formance. For example, the following two statements are not equkvalent:

ADD <= Al + A2 +A3 +A4 ; ADD <= [Al + A2) + (A3 + A4);

The first statement cascades three adders in series. The second statement cre-

ates *O adders in parailel: Al + A2 and A3 + A4. h the second statement, the

two additions are evaiuated in paraiiel and the results are combined with a third

adder. RTL simulation results are the same for both statements, however. the

second statement resuits in a faster circuit after synthesis.

A.5 Xillnw Name Conventions

XLllnx has reseme names for their FPGA. The following FPGA resource

names are r e s d and should not be used to name nets or components:

- - - - -

APPENDIX A - Targeting VHDL Design to Xilinx FPGAs

Configurable Logic Blocks (CLBs). input/Output BIocks [IOBs), clock buffers.

tristate bufEer (BUlTs). oscillators, package pin names. CCLK. DP. GND.

VCC, and RST

CLB names such as AA, AB, and RE2

Prfmitive names such as TDO, BSCAN, MO. M l . M2. or STARTUP

Do not use pin names such as Pl and P2 for component names

Do not use pad names such as PADl for component names

For f/urther XL1Lnx. naming conventions, XLUnx Data Books [2] provide a more

detailed references.

A.6 Latches and Registers

VHDL compflers Infer latches h m incomplete specffications of condi-

tional expressions. Latch primitives are not available in XC40ûû CLBs, howwer,

the IOBs contain input latches. Latches described in VHDL are implemented

with gates in the CLB fhction generators. For example. the D latch shown in

Figure Al is implemented with one functioa generator. The D Latch imple-

mented with gates is shown in Figure A2.

APPElYDlX A - Targaing VHDL Wign to Xilfnx FPGAs

LBRARY mgc~mctabIe : USE mgcgoctablcqsim,Iogicall; ENTïTY d-htch IS

FORT ( GATE DATA: in qsirn-state: Q: out qsh-state):

end casecaseex:

ARCHITECTURE BEHAV OF d-latch CS begin

LATCH- process (GATE. DATA)

begin if (GATE = '1') thcn Q <= DATA-,

end if; end pnictss: -End LATCH

end BEHAV;

Figure Ami : Latch inference

DATA

GATE

Figure AmZ: Latch implemented with gates

In this example. the VHDL code contaias an IF statement without the ELSE

which ahays implies a latch in gate-ld npreseatation. The drawback of a

latch is that it îs implemented as a combinatorial feedback loop in a CLB and

APPENDIX A - Targeüng VHDL Design to Xilinx FPGAs

synthesis bols do not process hold-time requirements because of the uncer-

tainty of routing delays. In order to diminate unnecessary latches. it is desirable

to replace them with D regîsters. as each CLB has two D flip-flops. To convert a

latch to a D register uses an ELSE clause in the ïF statement or a WAIT UNTïL

statement.

In ai l other cases (such as latches with reset/set or enable). use the D £Np-flop

instead of a latch. This rule also applies to JK and SR flip-flops. Table k 1 pro-

vides a cornparison of area and speed for a D iatch implemented with gates and

a D fiip-flop.

table A.l: D latch implementation corn parison

D Flip-Flop

Requires change to VHDL to convert D latches to D Bip-flops, No hold time or combinatorial 1 0 0 ~

-

Advantaged Disadvantages

1 Ares 1 1 Function Gellerator 1 1 Register

D Latch

VHDL that infers D latch implemented with gates. Combinato- riai feed-back lmp results in hold-time requirement.

~- -

1 Logic Level; no combinatorial Combinatorial feedback loop

A.7 lmplementing Multiplexers with Tristate Butters

A 440-1 multiplexer is efficient@ implemented in a single XC4000 Cm.

The six input signals (four inputs, two select Unes) use the F. G, and H fûnction

generators. Multiplexers that are larger than 440-1 exceed the capacity of one

- - - -

APPENDIX A - Targeting VHDL Design to Xilinx FPGh

CLB. For example a 16-to-1 multiplexer requires fie CLBs and has two logic lev-

elS. These addltional CLSs increase area and delay. in order to utiiize! XC4000

resources. using tristate b d i (BUETs) is recommended to implement multi-

plexers larger than 4to- 1.

A VHDL design of a 5-to-1 muîtipiexer built with gates is shown in Figure A3.

TypicaUy. the gate version of this rnultip1exer has binary encoded selector inputs

and requires three! select inputs (SELd:O>). The schematic representation of

this design is shown in Figure A3

LJBRARY mgcpnable ; USE rngcqoctable.qsim,lo~caii ; EiNTïW m u s IS

PORT ( sel: in qsimqsimscatc-vcctor(2 downto O); A.B.C,D,E: in qsim-Statt; MUXMUXOUT: out qsim-statt);

end mux-gatc:

SELPROCIESS: proctss (SEL.AB.UW b e n

case SEL is wben "000" => MUX-OUT (= A; when "001" => MUX-OUTG B; when "010" => MUX-OUTeC; when"Ol1" => MUX-OUT-D: whenothcn => MüX-OUTOE;

end case; end proass; -Ead SEL-PROCESS

end B E W .

Figure A.3: lmplementing 540-1 MUX with gates

APPENOIX A - Taqeting VHOL Wign to Xilinr WGAs

Figure A.4: 5-to-1 MUX implemented with gates

LIBRARY mgcnrtable ; USE rngc~rtableqsim,logicaü ; ENïTïY muxUXtbufIS

PORT ( sels in qsimsimstate-vector(4 dodownto O); A.B.CD.E in qsim-statc; hinTXhinTXOUT: wt qsimsimStateState~lvcdvcdlc):

end mux-tblb;

AR- BEHAV OF mux-buf IS b e n

MUXMUXOüT c= A whca (SEL(O)='O') e k TT; MUX-OUT c= B whcn (SEL(l)='û') e k 2'; MUX-OUT c= C when (SEL(2)='0 e k Z; MCrXMCrXOüT c D when (SEL(3)='0') e k 2': MUXMUXOüT c= E when (SEL.(4)='0? e k Z:

end B E W .

Figure 11.5: lmplementing 540-1 MUX with BUFTs

APPENMX A - Tafgeting VHDL üesign to Xillnx FPGAs

The VHDL design shown in Figure A5 is a 540-1 multiplexer buiît with tristate

bu8Cers. The -tate b s e r version of the multiplexer has one-hot encoded sdec-

tor inputs and reqrrires five select inputs SEk4:Ox The schematic representa-

tlon of this design is shown in Figure A6.

Figure A.6: 5-1 0-1 MUX implemented with BUFTs

Appendix 6

Xilinx Device Quick References

Tabk 6.1 : Xillnx ûevices, Packages and Speed Grades

Device

XC4a)2A

Packages

PCW PQlûûPG120

Speed Grades

-5 -6

Ref erences

B. Gilbert, "A High-Performance Monolithic Muitipkr Using Active Feedback," IEEE J. Solid-Srate CircwWts, vol. SC-9, pp* 364-373, 1974.

Xilinx Inc., ï?ze Xilinx Data Book, 1994-

W. S. McCuiIoch and W. Pitts, "A L o g i d Calcuius of The Ideas immanent in Ner- vous Activity," Bde tm of Mathematical Biophysics 5, pp. 1 15-133,1943-

F. Rosenblatt, "Pnnciples of Neuzodynamics," New York Spartan Books, 1959.

M. Minsky and S. Papext, "Perceptrons: An Introduction to Computationai Geom- etry," Cambridge, MA: The MZT Press, 1969.

DI Rumeihart, G. Hinton and R Wiams , "Learning Internai Representation by Backpropagating Errors," Nature: 323, pp. 533-536, 1986.

B. R Gaines, "Stochastic Computing Sysîems," AdVances in infonnotr*on System Science, volume-2, Juiius T. Tou, editor, Plenum Press, 1969.

PI Mars, Stochastic and Detenninisnlc Averuging Processors, The Institution of Electrical Engineers, London and New Yotk, 198 1.

S.W. Golomb, '"Shift Register Sequences", Holden-Day Publishing Co.. San Fran-

ciso, 1982.

P. Hortensius, R- McLeod, B. Podaima, "Ceiidar Automata Circuits for Built-ln Self-Test", i23M Journi of Research and Developmenr vol 34, March, 1990.

P. Hortensius, "'ParaUel Computation of Non-deterrninistic Aigorithms in VLSL" Ph.D. thesis, Department of Eiectncal and Computer Engineering, Universiy of

Manitoba, 1987- -

F. Breglez C. Gloster, and G. Kedem. "Hardware-based Weighted Random Pat- tern Generation for Boundary Scan," IEEE International Test Conference, Aug 1989.

J. Tomberg, T. Ritoniemi, K. Kaski and H. Tenhunen, "FuU Digital Neural Netwok

Impiementatïon Based on Puise Density Modulation," Proc. IEEE Cctstonr M e -

grated Circuits Con$, (San Diego, CA: May 15-17), pp. 11.7.1-12.7.4. 1989.

I. Tomberg and K. Kaski, "Pulse-densiy Modulation Technique in VLSI Imple-

mentation of Neural Network Algorithms," IEEE Journal of Solid Srare Circuits,

25(2), pp. 1277- 1286. Oct. 1990.

LM. Tocniinson Jr., LM. Waiker and LM. Sïivilon, "A Digital 'ieurai Nenvork Archi-

tecture for VLSI," Proc. IJCM-90. pp. 545-550, San Diego, CA, 1990.

I- Dickson, R ~McLeod and H. Card, '5 tochastic Arithmetic Implementations of

Neusai Networks with In Situ Learniag," IEEE International Conference on Nertml

Nenvorks. (San Francisco. CA: Mar. 28-Apr. 1). pp. 7 1 1-7 16. 1993.

Drew van Camp, Evan E. Steeg. and Tony Plate. XERIONNeurcd Network Simrdcr- tor. Computer Science Department, University of Toronto, 199 1.

I. Dickson. "S tochastic An thmetic hplementation of -cial Neural Networks,"

MSc. Ihesis. Department of Electrical Enginee~g, University of LMantobe 1992.

Rapid-Prototyping of Artincial Neural Networks 73

D. RumeIhart, 1. McCLeiiand and PDP Research Group, ParaUel Disrributed Pro-

cessVrg Volume 1, The h4ïï Press, 1986.

Y. C. Kim, and M. Shanblatt, "Random Noise Enects in Rilse-Moée Digital Md- tiIayer Neural Networks". IEEE TrmacttOns on Neural NetworkF Vol 6, Nu. 1, lanuaq 1995.

Mentor Graphies, Bold Browser, 1995.

Neocad Inc, Neocad FPGA Foundry Tuorial, L994

Documents

MID-PROTOTYPING OF ARTIFICIAL NEURAL NETWORKS · MID-PROTOTYPING OF ARTIFICIAL NEURAL NETWORKS BY ROGER &W. NG A Thesis ... struct vexy large neural networks us- FPGA technology