11
Int. J. Electron. Commun. (AEÜ) 63 (2009) 810 – 820 www.elsevier.de/aeue Hardware implementation of a pulse mode neural network-based edge detection system Mohamed Krid , Alima Damak, Dorra Sellami Masmoudi Computer Imaging and Electronics Systems Group from Research Unit on Intelligent Control, Design & Optimization of Complex Systems (ICOS), University of Sfax, Sfax Engineering School, BP W, 3038 Sfax, Tunisia Received 9 April 2008; accepted 13 June 2008 Abstract In this paper, we exploit the powerful means of neural networks with respect to function approximation. Once they are implemented on-chip, they can be reconfigured for adjusting their input–output relation in order to achieve clustering for decision making but also various image processing tasks such as filtering, edge detection, etc. As an illustration example, we propose here a neural network-based edge detection system. Edge detection reduces significantly the amount of data and filters out information that may be regarded as less irrelevant. It is becoming an important step in segmentation for many im- age processing applications. The proposed network achieves Canny operator edge detection based on pulse mode operations. Indeed, pulse mode neural networks are becoming an attractive solution in neural network implementation because of the advantages they provide over the continuous mode such as compactness of the pulse multiplier and flexibility of most of the blocs. Such simplicity offers the possibility of on-chip learning. In this work, the proposed edge detection network uses new extended range synapse multipliers operating in a fixed point format with a very simple architecture and adjustable activation functions. To provide the best edge detection, the back-propagation algorithm is modified to have pulse mode operations. Simulation results show the efficient learning and good generalization results. The corresponding design was implemented on a virtex II PRO FPGA platform. Synthesis results prove that the implemented neural network is more compact in terms of size than conventional implementations of a Canny edge detector. 2008 Elsevier GmbH. All rights reserved. Keywords: Canny edge detector; FPGA implementation; Pulse mode; Neural network; Synapse multiplier 1. Introduction Artificial neural network (ANN) is the best system per- forming intelligent operations in pattern recognition appli- cations. ANNs owe their performances to the mimicing of the ability of the biological system to cluster data, based on a learning process. In a pattern recognition system, a neural network-based clustering is generally applied after a series Corresponding author. Tel.: +216 22764977; fax: +216 74848115. E-mail addresses: [email protected] (M. Krid), [email protected] (A. Damak), [email protected] (D.S. Masmoudi). 1434-8411/$ - see front matter 2008 Elsevier GmbH. All rights reserved. doi:10.1016/j.aeue.2008.06.011 of preprocessing steps aiming at reducing noise and other irrelevant information, so that we preserve only discrimi- nating ones. Hence, the idea is to apply an ANN in these early steps as well as for clustering. Indeed, ANN provides less operation load and has more advantages for reducing the effect of the noise when compared to other conventional methods [1]. In that case, the same ANN can be parameter- ized for different operations successively such as filtering, segmentation and classification, etc. In this paper, we consider the Canny edge detection oper- ator for modeling a neural network. Edge representation of an image drastically reduces the amount of data to be pro- cessed, and retains important information about the shapes

Hardware implementation of a pulse mode neural network-based edge detection system

  • Upload
    uus

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Int. J. Electron. Commun. (AEÜ) 63 (2009) 810–820

www.elsevier.de/aeue

Hardware implementation of a pulse mode neural network-based edgedetection system

Mohamed Krid∗, Alima Damak, Dorra Sellami Masmoudi

Computer Imaging and Electronics Systems Group from Research Unit on Intelligent Control, Design & Optimization of Complex Systems(ICOS), University of Sfax, Sfax Engineering School, BP W, 3038 Sfax, Tunisia

Received 9 April 2008; accepted 13 June 2008

Abstract

In this paper, we exploit the powerful means of neural networks with respect to function approximation. Once they areimplemented on-chip, they can be reconfigured for adjusting their input–output relation in order to achieve clustering fordecision making but also various image processing tasks such as filtering, edge detection, etc. As an illustration example,we propose here a neural network-based edge detection system. Edge detection reduces significantly the amount of data andfilters out information that may be regarded as less irrelevant. It is becoming an important step in segmentation for many im-age processing applications. The proposed network achieves Canny operator edge detection based on pulse mode operations.Indeed, pulse mode neural networks are becoming an attractive solution in neural network implementation because of theadvantages they provide over the continuous mode such as compactness of the pulse multiplier and flexibility of most of theblocs. Such simplicity offers the possibility of on-chip learning. In this work, the proposed edge detection network uses newextended range synapse multipliers operating in a fixed point format with a very simple architecture and adjustable activationfunctions. To provide the best edge detection, the back-propagation algorithm is modified to have pulse mode operations.Simulation results show the efficient learning and good generalization results. The corresponding design was implementedon a virtex II PRO FPGA platform. Synthesis results prove that the implemented neural network is more compact in termsof size than conventional implementations of a Canny edge detector.� 2008 Elsevier GmbH. All rights reserved.

Keywords: Canny edge detector; FPGA implementation; Pulse mode; Neural network; Synapse multiplier

1. Introduction

Artificial neural network (ANN) is the best system per-forming intelligent operations in pattern recognition appli-cations. ANNs owe their performances to the mimicing ofthe ability of the biological system to cluster data, based ona learning process. In a pattern recognition system, a neuralnetwork-based clustering is generally applied after a series

∗Corresponding author. Tel.: +21622764977; fax: +21674848115.E-mail addresses: [email protected] (M. Krid),

[email protected] (A. Damak), [email protected](D.S. Masmoudi).

1434-8411/$ - see front matter � 2008 Elsevier GmbH. All rights reserved.doi:10.1016/j.aeue.2008.06.011

of preprocessing steps aiming at reducing noise and otherirrelevant information, so that we preserve only discrimi-nating ones. Hence, the idea is to apply an ANN in theseearly steps as well as for clustering. Indeed, ANN providesless operation load and has more advantages for reducingthe effect of the noise when compared to other conventionalmethods [1]. In that case, the same ANN can be parameter-ized for different operations successively such as filtering,segmentation and classification, etc.In this paper, we consider the Canny edge detection oper-

ator for modeling a neural network. Edge representation ofan image drastically reduces the amount of data to be pro-cessed, and retains important information about the shapes

M. Krid et al. / Int. J. Electron. Commun. (AEÜ) 63 (2009) 810–820 811

of objects in the scene. For most of the high-level machinevision tasks such as motion analysis and object recognition,an edge map is sufficient to carry out further analysis.The Canny edge detector algorithm [2] is based on sev-

eral steps. Firstly, it smoothes the image to eliminate thenoise. Secondly, it finds out the image gradient to highlightregions with high spatial derivatives. Then the algorithmtracks along these regions and suppresses any pixel that isnot at the maximum. The gradient array is further reducedby hysteresis, which is used to track along the remainingpixels that have not been suppressed. Hysteresis uses twothresholds: a high one and a low one. Any pixel in the imagethat has a value greater than the high threshold is presumedto be an edge pixel, and any pixels that are connected to thisedge pixel and have a value greater than the high thresholdlevel are also selected as edge pixels.In this work, we try to include all these complex steps into

a neural network-based approach. A training step is thereforerequired for the network to learn how to process databaseimages into their transform by a Canny edge detector.Although most neural networks are achieved by software

simulation, many other applications require implementationof fast and large neural networks on an efficient custom de-vice, which can benefit from the inherent parallelism em-bedded in neural network dynamics. One of the effectiveapproaches to carry out the neurological networks’ materialexecution is the architecture based on pulse mode operationsbecause pulse mode neural networks (PNN) have fascinatingproperties. For example, pulse systems are invulnerable tonoisy conditions. Moreover, pulse density systems can han-dle quantized analog values but with a digital circuit [3,4].A pulse mode digital architecture using stochastic com-

puting has been proposed in [3,5]. In this architecture, sig-nals are represented by probabilities and encoded in randompulse streams. Stochastic computing is performed with ba-sic logic gates using random pulse sequences as inputs. Thisarchitecture has been later revised in [6] by introducing thedirect digital frequency synthesizer (DDFS) in the synapseunit, which is much simpler than numerical multipliers. Oneof the limitations in this approach is that the weight valuesof the synapse multiplier are limited in the range [−1, 1],which makes learning difficult. Another limitation is thatthe activation function is almost fixed, which affects the ac-curacy of the whole network [7]. Later on, a PNN with apiecewise linear activation function has been proposed in[8]. The function of the neuron is adjustable; however, itscircuit size is rather large because it uses three accumulatorsand a multiplexer to generate the activation function. Thelarge circuit size with also slow down the neuron operation.In this paper, we propose an efficient PNN-based edge

detection system acting as a canny operator. The proposedPNN uses a new synapse multiplier operating in the fixedpoint format and an adjustable activation function, whichis more compact than those in early implementations [8].Besides, it can operate weight values without any limitations.In the learning step, we use a series of heterogeneous image

Fig. 1. Generic structure of an artificial neural network.

databases of three classes, of 30 samples each. Once lowtraining errors and good generalization rates are obtained,the PNN is implemented on an FPGA platform.This paper is organized as follows: we introduce firstly

in Section 2, the feedforward multilayer neural network inboth continuous and pulse mode. Secondly, in Section 3,we describe the general architecture of the forward pathof the proposed PNN. Then we propose a modified on-chip learning configuration. Finally, in Section 4, we presentsimulation and hardware synthesis results, followed by aconclusion in Section 5.

2. Multilayer neural network

2.1. Feedforward multilayer neural network

ANNs can be classified into two general types accordingto how they learn: supervised or unsupervised. The back-propagation algorithm is considered to be a supervised learn-ing algorithm, which requires a trainer of not only the inputsbut also the expected outputs. According to Rumelhart et al.[9], the operation of an ANN using the back-propagation al-gorithm is divided into two phases, i.e., learning phase andretrieving phase. During the learning phase, weights are ad-justed to perform a particular input–output relation and thelearning phase consists of forward and backward operations.In the forward path, the output of the network is calculatedfrom input data. An error is then measured with respect to theexpected output. Accordingly, network weight adjustmentsare performed during the backward path. In the retrievingphase, the forward operation is executed.

2.1.1. Forward computationDuring the forward computation, data from neurons of a

lower layer (i.e. (s − 1)th layer) are propagated forward toneurons in the upper layer (i.e., sth layer) via a feedforwardconnection network. The structure of the ANN is shownin Fig. 1, with M layers and N neurons. The computationsperformed by each neuron during a forward operation is as

812 M. Krid et al. / Int. J. Electron. Commun. (AEÜ) 63 (2009) 810–820

Fig. 2. Architecture of the frequency synthesizer.

Fig. 3. Transition of the register value and active signals in the frequency synthesizer.

Fig. 4. Block diagram of the forward path in a PNN.

Fig. 5. Block diagram of the proposed synapse multiplier.

follows:

H (s)k =

Ns−1∑j=1

w(s)k j o

(s−1)j + �(s)k (1)

where j < k, s = 1, . . . , M , H (s)k is the weighted sum of

the kth neuron in the sth layer, w(s)k j is the synaptic weight,

which corresponds to the connection from neuron unit j inthe (s − 1)th layer to neuron unit k in the sth layer of theneural network, o(s−1)

j is the neuron output of the j th neuron

in the (s − 1)th layer and �(s)k is the bias of the kth neuronin the sth layer.

The output of the kth neuron in the sth layer is given by(2), where k = 1, . . . , N , s = 1, . . . , M and f (H (s)

k ) is the

activation function computed on the weighted sum H (s)k :

o(s)k = f (H (s)k ) (2)

2.1.2. Backward computationThe back-propagation algorithm is executed in the back-

ward computation, although a number of other ANN train-ing algorithms can be substituted here. The criterion forthe learning algorithm is to minimize the error betweenthe expected (or teacher) value and the actual output valuethat was determined in the forward computation. The back-propagation algorithm [10] is defined as follows:

• Starting with the output layer, move back towards the inputlayer and calculate the local gradients, as shown in (3) and(4).

• Calculate the weight (and bias) changes for all the weightsusing (5).

• Update all the weights (and biases) via (6).

M. Krid et al. / Int. J. Electron. Commun. (AEÜ) 63 (2009) 810–820 813

Fig. 6. Ramp function neuron.

Fig. 7. Ramp function signals: (a) register value and (b) output.

�(s)k =⎧⎨⎩tk − o(s)k , s = MNs+1∑j=1

w(s+1)k j �(s+1)

j , s = 1, . . . , M − 1(3)

where �(s)k is the error term for the kth neuron in the sthlayer; the difference between the teaching signal tk and theneuron output o(s)k , and �(s+1)

j is the local gradient for thej th neuron in the (s + 1)th layer:

�(s)k = �(s)k f ′(H (s)k ), s = 1, . . . , M (4)

where f ′(H (s)k ) is the derivative of the activation function.

�w(s)k j = ��(s)k o(s−1)

j , k = 1, . . . , Ns j = 1, . . . , Ns−1 (5)

where �w(s)k j is the change in synaptic weight (or bias) corre-

sponding to the gradient of error for connection from neuronunit j in the (s − 1)th layer to neuron k in the sth layer:

w(s)k j (n + 1) = �w

(s)k j (n) + w

(s)k j (n) (6)

where k = 1, . . . , Ns , j = 1, . . . , Ns−1, n is the current iter-ation, w

(s)k j (n + 1) is the updated synaptic weight (or bias)

to be used in the (n + 1)th iteration of the forward compu-tation, �w

(s)k j (n) is the change in synaptic weight (or bias)

calculated in the nth iteration of the backward computationand w

(s)k j (n) is the synaptic weight (or bias) to be used in the

nth iteration of the forward and backward computations.

2.2. Pulse mode multilayer neural network

Unlike a continuous multilayer neural network, the PNNuses frequency to represent the signal levels. For this reasonit must convert continuous signals to pulsed ones, whichare characterized by its normalized frequency. Fig. 2 showsa block diagram of a frequency synthesizer. At each risingedge of a certain period Tf the input value I is fed to theregister. Then a decay +1 is subtracted from the register ateach clock period.The output pulse is generated from the complement of the

most significant bit (MSB) of the register multiplied by theclock signal using an AND gate. Transition of the registervalue and generation of output signal are depicted in Fig. 3.The proposed frequency synthesizer uses only natural in-

put values. For this reason we must use a quantization algo-rithm [11] to convert fixed-point values to unsigned naturalones. The period in which we can process the maximum ofinput values is Tf . This value is equal to 2n , where n is thebit number of digital quantized inputs.The normalized frequency is expressed by

fn(I ) =∑Tf

0 (output pulses)

Np(7)

where∑Tf

0 (output pulses) is the sum of the output pulsesduring the period Tf and Np is the maximum number ofpulses that can be presented in Tf .

3. Architecture of the forward path in a PNN

The block diagram of the forward path in the PNN, shownin Fig. 4, consists of two computational elements: a synapseunit, which performs the weight multiplication and a neuronunit, which computes the nonlinear activation function.

3.1. Synapse unit

Fig. 5 shows the block diagram of the proposed synapsemultiplier. At each input pulse, the weight value is added tothe content of a register, which is reset at each period Tf .Hence, the output of the register H (s)

k is equal to the productof input pulses by weight.

814 M. Krid et al. / Int. J. Electron. Commun. (AEÜ) 63 (2009) 810–820

Fig. 8. Neuron unit with a smoothing circuit.

Fig. 9. Signals in the neuron unit: (a) pulse generator output and(b) content of the register.

A major advantage of this configuration of the synapsemultiplier is that weights have no range limitations. Theseweights are represented in a fixed-point format. Thus, theprecision of the weights can be modified by varying thebit-length nw. Moreover, the synapse multiplier gives anadvantage in terms of circuit size.

3.2. Neuron unit

The nonlinear activation function is approximated by aramp function, which is given by the following equation:

fp(H ) =

⎧⎪⎪⎨⎪⎪⎩

1 if H > pH

(2.p)+ 0.5 if − p�H � p

0 otherwise

(8)

where p is the control parameter of the slope of the rampfunction and H is the internal potential in the neuron, whichis the product of the sum h of weight values, and the average

Fig. 10. Block diagram of one neuron unit connected with 6synapse units to characterize the activation function.

frequency F is given by the following equation:

H = hF (9)

The block diagram of a single ramp function neuron isdepicted in Fig. 6. Fig. 7 shows the transition of the registervalue and neuron output.Each period Tf , the sum of input weight values from

synapse multipliers H (s)k , is fed to the register where it is

accumulated. When the content of the register R is positive(MSB = 1), p is subtracted from it at each clock sampleand the output pulse of ramp function is generated with afrequency equal to the frequency of the clock ( fclk). After-wards, the content of the register alternates between positiveand negative and the gradient of the change is p. The timespent by the register to approach zero is Tp, later on, thefrequency of the output becomes 0.5× fclk. If R is negative

M. Krid et al. / Int. J. Electron. Commun. (AEÜ) 63 (2009) 810–820 815

−50 −40 −30 −20 −10 0 10 20 30 40 500

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Ts=0,8 TfTs=0,6 TfTs=0,4 TfTs=0,2 Tf

Hk

f(H

k)

Fig. 11. Characterizations of the neuron with S = 4 and differentvalues of Ts.

−50 −40 −30 −20 −10 0 10 20 30 40 500

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

S = 2 S = 4 S = 6 S = 8

Hk

f(Hk)

Fig. 12. Characterizations of the neuron with Ts = 0.6 × Tf anddifferent values of S.

(MSB=0), the ramp function output is equal to zero. Hencethe average frequency of the ramp function neuron is givenas follows:

fp(H ) =

⎧⎪⎨⎪⎩

Tp + (Tf − Tp) × 0.5

Tfif H �0

(Tf − Tp) × 0.5

Tfif H < 0

(10)

where Tf and Tp are given by

Tf = 1

F(11)

Tp = |h|p

(12)

−50 −40 −30 −20 −10 0 10 20 30 40 500

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

neuron caracteristic logsig activation function

Hk

f(H

k)

Fig. 13. Characterizations of the neuron with p= 2, Ts = 0.6× Tfand S = 4.

By substituting (9), (11) and (12) in (10) we obtain anactivation function with the shape required in (8).To obtain an activation function close to a sigmoid func-

tion, a smoothing circuit is added to the neuron unit. It con-sists of a pulse generator, a shift register and a multiplexer.Fig. 8 shows the block diagram of the modified neuron unit.The pulse generator gives S pulses during the interval Ts

inside Tf . The shift register divides by two the content of theregister S times per period Tf . Fig. 9 shows the two periodsTf and Ts, pulse generator output and register transition.To simulate the neuron’s activation function, we have im-

plemented one neuron unit connected to 6 synapse units asshown in Fig. 10. The weight values are expressed in 12-bitsigned fixed-point format and the ramp function parameteris equal to 8 (p= 8). We use frequency synthesizers to gen-erate pulsed input signals from quantized digital values ex-pressed in an eight-bit unsigned fixed-point format (n = 8).Both weight values and input frequencies are randomly set.Fig. 11 shows the slopes of the activation function gen-

erated for different values of Ts and S = 4, while Fig. 12shows the variation of the activation function when adjustingthe number of shift operations S made in the same intervalTs. The activation function curves are very close to sigmoidfunctions with different slopes.This approximation of the activation function brought

several advantages in hardware implementation of neuralnetworks. Firstly, the architecture of this function is notcumbersome and is easy to realize. Secondly, the proposedactivation function is differentiable and useable in the back-propagation algorithm for training. Thirdly, the activationfunction is adjustable and can be approximated by thefollowing sigmoid function:

f (x) = 1

1 + e−ax+b(13)

816 M. Krid et al. / Int. J. Electron. Commun. (AEÜ) 63 (2009) 810–820

Fig. 14. On-chip learning configuration of the PNN.

Thus, using p = 2, Ts = 0.6 × Tf and S = 4 as neu-ron unit parameters, the activation function depicted inFig. 13 can be approximated by the Logsig functiongiven by

f (x) = 1

1 + e−x(14)

Therefore, the network configurations and weights canbe calculated on the host computer by using the back-propagation algorithm; then the weights can be used for thePNN without on-chip learning.

4. Architecture of the PNN with on-chiplearning

The general architecture of the pulse neural network withon-chip learning is described in Fig. 14. The upper half ofthe block diagram is the forward path and the lower halfis the back-propagation path, which performs the on-chiplearning.The back-propagation algorithm uses the derivative of the

neuron’s activation function. A pulse differentiator is em-ployed to generate the derivative f ′(Hk). The differentiatorgives an output pulse when it finds the head and the last ofthe pulse stream. It is reported that the learning performanceof the MNN with the back-propagation algorithm can beimproved by adding a pseudo-random sequence generatedby a random pulse generator to f ′(Hk) [12]. The block dia-gram of the pulse differentiator and random pulse generatorsignals are shown, respectively, in Figs. 15 and 16.

Fig. 15. Pulse differentiator.

The error is propagated through the two signals: absoluteerror and sign. The error pulse is generated when the teach-ing signal and the output signal are different. The sign signalindicates the sign of the error.To calculate the quantity w

(s+1)k j �(s+1)

j in (3), instead ofthe actual weight value, the sign bit of the weight (MSB) ismultiplied by the error term �(s+1)

j [13,14]. To perform thisoperation, an exclusive-OR gate is used to invert the signsignal when the MSB of the weight is “1”. The products�(s)k f

′(H (s)

k ) and ��(s)k o(s−1)j are realized by logical AND

gates. These signals are used, respectively, to enable and up-date the up-down counters (up or down) that contain synap-tic weights.

M. Krid et al. / Int. J. Electron. Commun. (AEÜ) 63 (2009) 810–820 817

Fig. 16. Signals in a random pulse generator.

5. Simulation results and hardware synthesis

5.1. Performance test of the PNN with on-chiplearning

The multilayer neural network architecture with on-chiplearning presented in Fig. 14 is used to test learning oper-ations of three input logic functions. The network consistsof three input nodes, one hidden layer with four neuronsand a single output neuron. Both the inputs and the hiddenlayer include an offset bloc. Synaptic weights are randomlychosen in the beginning of the training process.Fig. 17 shows the multilayer neural network signals after

100 iterations in the learning process. The output becomescloser to the target with a small shift between the two signals.The error between the output signal and the teaching sig-nal decreases with progress of the learning process. In fact,Fig. 18 shows that the learning is settled with an error of0.4% after 1300 iterations.The same network is trained to learn other logic functions

such as OR, AND, NAND and NOR. For all these functionsthe PNN converges approximately after 1300 iterations withan error less than 0.5%.

5.2. Simulation results for neural network-basededge detection

The proposed PNN is used as a prevalent technology, in-stead of classic edge detectionmethods. According to Canny,

Fig. 17. PNN training three inputs’ exclusive OR logic function:signals after 100 iterations.

Fig. 18. PNN training three inputs’ exclusive OR logic function:signals after 1300 iterations.

818 M. Krid et al. / Int. J. Electron. Commun. (AEÜ) 63 (2009) 810–820

Fig. 19. Training results: (a) output image and (b) target image.

Fig. 20. Generalization result: (a) input image, (b) target imageand (c) output image.

Fig. 21. Generalization result: (a) input image, (b) target imageand (c) output image.

the optimal filter that meets all three criteria above can beefficiently approximated using the first derivative of a Gaus-sian function. The multilayer neural network was learningthe Canny operator. A series of images from a database wasused. Each one is formed by a particular set such as hands

Fig. 22. Generalization result: (a) input image, (b) target imageand (c) output image.

Table 1. Relative errors for the PNN generalization test

Dinosaur images 4.69%Fusil images 3.91%Hand images 6.79%

(for biometry applications), or mammography images (forbreast cancer classification), or simple images such as di-nosaur images, etc. Each set is divided into two classes: oneis for the network training and the other is for a networkgeneralization test.The neural network consists of nine inputs, four neurons in

the hidden layer and one output. Training inputs are obtainedby 3×3 mask sweeping images. The network output is takenas the middle point in the mask resulting from the Cannyoperator.In the learning step, we used the back-propagation algo-

rithm for adjusting network parameters. In this step, we haveused a database containing 30 different images of dinosaursto learn the Canny operator. After learning, weights and thebias will be subsequently used in a PNN in forward oper-ations. This training is accomplished with 4000 iterationswith an error of 3.59% and an example of a training resultis illustrated in Fig. 19.

Performances of the PNN architecture and generalizationcapabilities of the neural network are tested on-chip by im-plementing the forward mode on an FPGA platform and in-troducing other images for different categories not used inthe learning phase. Figs. 20–22 show generalization resultsof the PNN working in forward to approximate edge detec-tion function.To be able to evaluate the effectiveness of the proposed

network, one can measure the relative error as follows:

E =∑

(IN − ICanny)∑ICanny

(15)

M. Krid et al. / Int. J. Electron. Commun. (AEÜ) 63 (2009) 810–820 819

Table 2. Device utilization summary

PNN Based edge detection

Number of external IOBs 56 out of 396: 14%Number of RAM blocs 15 out of 44: 34%Number of slices 2100 out of 4928: 42%

Table 3. Timing summary

PNN Based edge detection

Minimum period 9.771nsMaximum frequency 102.344MHzMinimum input arrival 4.139nsMaximum output required timeafter clock

9.094ns

where IN is the image resulting from the network output andIcanny is the image processed by the Canny operator. Indeed,Table 1 shows the relative errors of network generalizationusing three image categories. These results show good gen-eralization rates of the PNN implemented on FPGA.

5.3. Hardware synthesis

The proposed architecture of the PNN is implemented onFPGA Virtex II Pro platform. The precision of the synapseweight values is expressed in a 12-bit signed fixed-point for-mat. Activation function parameters are chosen as follows:p = 2, Tf = 256, Ts = 0.4 × Tf and S = 4.Table 2PNN configuration. We can note that the proposed

architecture can be applied in a larger network, since thetarget FPGA allows it. Compared to conventional architec-tures, much more compactness is obtained [15].Table 3 summarizes the device timing. The maxi-

mum frequency that the implemented device can reach is102.344MHz. Since the proposed architecture is workingin a fully parallel fashion, this value decreases in proportionto the network scale.

6. Conclusion

In this paper, we consider a neural network-based edgedetection application, which is an important step in imageprocessing. We include the different complex Canny edgedetection operator steps in a pulse mode neural network,based on a learning/training phase. A series of heteroge-neous image databases, composed of three classes, is usedfor that purpose. A hardware implementation of the corre-sponding network making use of an improved architecturefor PNN, based on a simple synapse multiplier and an ad-justable activation function, has been proposed. Besides thecompactness of this solution, it allows the use of any signed

weight values presented in a fixed-point format. Experimen-tal results show that the characteristic of the neuron unit isprogrammable and very close to the sigmoid activation func-tion. The PNN was learning the canny operator with goodgeneralization rates. The whole system is implemented on afield-programmable gate array (FPGA). Implementation re-sults show that the circuit size of the PNN can be reducedby using the proposed synapse multiplier and neuron unitwhile providing the same performances as conventional ar-chitectures.

References

[1] Pinho AJ, Almeida LB. Edge detection filters based onartificial neural networks. In: Proceedings of ICIAP. IEEEComputer Society Press; 1995. p. 159–64.

[2] Canny L. A computational approach to edge detection. IEEETrans Pattern Anal Mach Intell 1986;8:679–98.

[3] Maeda Y, Nakazawa A, Kanata Y. Hardware implementationof a pulse density neural network using simultaneousperturbation learning rule. Analog Integrated Circuits SignalProcess 1999;18:1–10.

[4] Moon G, Zaghloul ME, Newcomb RW. Vlsi implementationof synaptic weighting and summing in pulse coded neural-type cells. IEEE Trans Neural Networks 1992;3:394–403.

[5] Van Den Bout DE, Miller III TK. A digital architectureemploying stochasticism for the simulation of hopfield neuralnets. IEEE Trans Circuits Syst 1989;36:732–8.

[6] Hikawa H. Frequency-based multilayer neural network withon-chip learning and enhanced neuron characteristics. IEEETrans Neural Networks 1999;10:545–53.

[7] Reyneri LM. A performance analysis of pulse stream neuraland fuzzy computing system. IEEE Trans Circuits Syst1995;42:624–60.

[8] Hikawa H. A digital hardware pulse-mode neuron withpiecewise linear activation function. IEEE Trans NeuralNerworks 2003;14:1028–1037.

[9] Rumelhart DE, McClelland JL, Group PR. Parallel distributedprocessing: explorations in the microstructure of cognition.Foundations Cambridge, MA: MIT Press; vol. 1, 1986.

[10] Lipmann RP. An introduction to computing with neural nets.IEEE Acoust Speech Signal Process Mag 1987; 688–95.

[11] Krid M, Masmoudi DS. FPGA implementation of afeedforward neural network. In: International conference onsystems, signals & devices, vol. 4, 2005.

[12] Hikawa H. Improvement on the learning performance ofmultiplierless multilayer neural network. In: Internationalsymposium on circuits & systems, vol. 1, 1997. p. 641–4.

[13] Baker T, Hammerstrom D. Characterization of artificial neuralnetworks algorithms. In: International symposium on circuits& systems; 1989. p. 78–81.

[14] Hikawa H. Implementation of simplified multilayer neuralnetworks with on-chip learning. In: International conferenceon neural network, vol. 4, 1995. p. 1633–7.

[15] Stephan H. A high-speed subpixel edge detectorimplementation inside a FPGA. Real-Time Imaging 2003;9:361–8.

820 M. Krid et al. / Int. J. Electron. Commun. (AEÜ) 63 (2009) 810–820

Krid Mohamed was born in Sfax,Tunisia, in 1979. He received hisB.S. degree in Electrical Engineeringfrom the National Engineering Schoolof Gabes, Tunisia, in 2002, and hisM.Sc. in Electrical Engineering fromthe National Engineering School ofSfax, Tunisia in 2005. He is currentlyworking on Ph.D. degree in hardwareimplementation of pulse mode neuralnetworks. He is an assistant of electro-

nics at the Higher Institute of Industrial Systems, University ofGabes, since 2005. Mr Krid is a member of the research groupof Computer Imaging and Electronics Systems from the researchunit on Intelligent Control, design and Optimization of complexSystems of the University of Sfax.

Damak Alima was born in Sfax,Tunisia, in 1981. She received herB.S. degree in Electrical Engineeringfrom the National Engineering Schoolof Sfax, Tunisia, in 2005, and herM.Sc. in Electrical Engineering fromthe National Engineering School ofSfax, Tunisia, in 2006. She is currentlyworking on Ph.D. degree in hardwareimplementation of pulse mode neural

networks. She is an assistant of electronics at the National Engi-neering School of Sfax, since 2006. She is a member of the re-search group of Computer Imaging and Electronics Systems fromthe research unit on Intelligent Control, design and Optimizationof complex Systems of the University of Sfax.

Dorra Sellami Masmoudi was born in Sfax, Tunisia, in 1969. Shereceived her engineering degree in Electrical Engineering from theNational Engineering School of Sfax, Tunisia, in 1994 and receivedthe price of best engineering student of the Republic of Tunisia.Subsequently she joined the I.X.L. laboratory of Microelectronicsat Bordeaux to work on a thesis. She received her Ph.D. in 1998.Her researches include analog high frequency circuit design, neuralnetwork implementations and analog/digital mixed design.