11
RESEARCH PAPERS ac c ur acy , les s no is e sens it ivi t y, more fl exi bi li ty and co mpatibilit y wi th di fferent types of pr ocessors. These digital implement ati ons can be done either wit h a digital signal pro cessor or FPGA or programmabl e logic design .  An FPGA-based implementa tion, would be the best cho ice from the previo usl y mention platforms since it can  work in parallel as is the case of ANNs behavior (Cantrel l &  Wurtz, 1993)(Baker & Hammerstr om, 1989)(Blai s & Mertz, 20 01)( Var gas , Bar ba , T orr es & Mat to s, 20 11 ). Pre vio us res ear ch on implementing var iou s kinds of neural net wor ks on the HDL platfo rm in (Al i, & Mohammed, 2010) (Omondi & Rajapakse , 2006 )(Ize boudj en, Farah, Bessalah, Bour idane & Chi khi , 200 8)( Sch emmel , Mei er & Sch urmann, 200 1) has focused on de vel op in g the ne ur on mod els and their INTRODUCTION The neurosci ence, study of the human brain, is thousands of years old. Thi s fascination wit h the human brain has led to the development of Ar ti fi ci al Neural Networ ks (ANNs)  which have been made possib le due to advances in elec troni cs. ANNs have been used successful ly in a broad spectr um of applications such as patter n recognition, data classification, control systems signal processing and functional approximat ions, et c. Much work has been done in these fi el ds that rely on software si mulati ons and invest igating the capabi li ti es of the ANN models using bot h analog and digital implementat ions (Torres-Huitzil, Gi rau, & Gauffr iau, 2007). Digi tal impl ementati ons are more popular for their basi c advantages of higher DESI GN ENHA NCEMENT OF COMBINA TI ONAL NEURAL NETWORKS USING HDL BA SED FPGA FRAMEWORK  FOR P A TTERN RECOGNI TI ON PRIY ANKA MEKALA * * Research Assistan t and PhD Can didate , Department of Ele ctr ical and Comput er Engine eri ng, Florida Int ernati onal Uni ver sit y , Miami, FL, USA. ** Ass ist ant Pr ofe sso r , Department of Ele ctr ical and Comput er Engine eri ng, Flo rida Int ernati onal Uni ver sit y , Miami, FL, USA.  ABSTRACT The fas t emerging highly -integrat ed mul timedi a dev ices requi re complex video/ image pr oce ssi ng tas ks leading to a very chall enging design process; as it demands more effici ent and high process ing syste ms. Neural networks are used in  many of these imaging applicat ions to represent the complex input- output relationsh ips. Softwa re implemen tation of these net wor ks att ain accuracy wit h tradeo ffs bet ween pr oce ssi ng per for mance (to achieve specif ied frame rat es, worki ng on large image data sets ), power and cost cons traints. The curr ent trends involv e conv entional proc essor being  replaced by the Field programmab le gate array (FPGA) systems due to their high performa nce when processing huge amo unt of dat a. The goal is to des ign the Combinati onal Neural Net wor ks (CNN) for pattern recognit ion usi ng an FPGA  based platform for accelerate d performan ce. The enhancemen t in speed and computati on from the hardware is  being compared to the softwa re (using MATLAB) model. The employmen t of HDL on the FPGA enables operation s to be  performed in parallel . Thus allowin g the exploitat ion of the vast paralle lism found in many real-w orld applica tions such as in robotics , cont roll er free gami ng and si gn/gesture recognition. As a vali dati on of the CNN hardware model a case study in patt ern recognit ion is being explored and implemented on Xilinx Spart an 3E FPGA boar d. T o meas ure the quali ty of learni ng in the trained net wor k mean squar ed err or is used. The pr oce ssi ng per for mance of thi s non-li near sto chasti c too l is det ermined by compar ing the HDL (paral lel model ) simulatio ns wit h the MA TLAB des ign (sequenti al model) . The gain in training time and memory used for processing is also deriv ed.  Keywords: VHDL, Combinat orial Neural Networks, Back Propagation, Pattern Recogniti on. By JEFFREY FAN ** 6 i-managers Journal o Electronics Engineering, Vol.  n ll  2 No. 1 Se pt embe r - No vember 2011

Project2MTECH

  • Upload
    kpyes34

  • View
    217

  • Download
    0

Embed Size (px)

Citation preview

8/2/2019 Project2MTECH

http://slidepdf.com/reader/full/project2mtech 1/11

RESEARCH PAPERS

accuracy, less noise sensi tiv ity, more f lexibil ity and

compatibili ty with different types of processors. These

digital implementations can be done either with a digital

signal processor or FPGA or programmable logic design.

 An FPGA-based implementation, would be the best

choice from the previouslymention platforms since it can

 work in parallel as is the case of ANNs behavior (Cantrel l &

 Wurtz, 1993)(Baker & Hammerstrom, 1989)(Blais & Mertz,

2001)(Vargas, Barba, Torres & Mattos, 2011). Previous

research on implementing various kinds of neural networks

on the HDL platform in (Ali, &Mohammed, 2010)(Omondi &

Rajapakse, 2006)(Izeboudjen, Farah, Bessalah, Bouridane

& Chikhi, 2008)(Schemmel, Meier & Schurmann, 2001) has

focused on developing the neuron models and their

INTRODUCTION

The neuroscience, study of the human brain, is thousands

of years old. This fascination with the human brain has led

to the development of Artificial Neural Networks (ANNs)

 which have been made possible due to advances in

electronics. ANNs have been used successfully in a broad

spectrum of applications such as pattern recognition,

data classification, control systems signal processing and

functional approximations, etc. Much work has been

done in these fields that rely on software simulations and

investigating the capabil it ies of the ANN models using

both analog and digital implementations (Torres-Huitzil,

Girau, & Gauffriau, 2007). Digital implementations are

more popular for their basic advantages of higher

DESIGN ENHANCEMENT OF COMBINATIONAL NEURAL

NETWORKS USING HDL BASED FPGA FRAMEWORK 

FOR PATTERN RECOGNITION

PRIYANKA MEKALA *

* Research Assistant and PhD Candidate, Department of Electrical and Computer Engineering, Florida International University, Miami, FL, USA.

** Assistant Professor, Department of Electrical and Computer Engineering, Florida International University, Miami, FL, USA.

 ABSTRACT 

The fast emerging highly-integrated multimedia devices require complex video/image processing tasks leading to a

very challenging design process; as it demandsmore efficient and highprocessing systems.Neural networks are used in

 many of these imaging applications to represent the complex input-output relationships. Software implementation of

these networks attain accuracy with tradeoffs between processing performance (to achieve specified frame rates,

working on large image data sets), power and cost constraints. The current trends involve conventional processor being

 replaced by the Field programmable gate array (FPGA) systems due to their high performance when processing huge

amount of data. The goal is to design the Combinational Neural Networks (CNN) for pattern recognition using an FPGA

 based platform for accelerated performance. The enhancement in speed and computation from the hardware is

 being compared to the software (using MATLAB) model. The employment of HDL on the FPGA enables operations to be

 performed in parallel. Thus allowing the exploitat ion of the vast parallelism found in many real-world applications such as

in robotics, controller free gaming and sign/gesture recognition. As a validation of the CNN hardware model a case

study in pattern recognition is being explored and implemented on Xilinx Spartan 3E FPGA board. Tomeasure thequality

of learning in the trained network mean squared error is used. The processing performance of this non-linear stochastic

tool is determined by comparing the HDL (parallel model) simulations with the MATLAB design (sequential model). The

gain in training time andmemory used forprocessingis also derived.

 Keywords: VHDL, Combinatorial Neural Networks, Back Propagation, Pattern Recognition.

By

JEFFREY FAN **

6 i-manager’s Journal o Electronics Engineering, Vol. nll

 2 No. 1 September - November 2011

8/2/2019 Project2MTECH

http://slidepdf.com/reader/full/project2mtech 2/11

RESEARCH PAPERS

are adaptation, parallelism, classification, optimization and

generalization. The debate over whether to build a generic

system that can be reprogrammed on user demands/

applications or a single specialized dedicated to one

application with high speed performance stil l prevails

(Omondi,&Rajapakse,2006).

The researchersproposea HDLbased designmethodology

to tradeoff the high level application requirements and the

low level FPGA hardware for patter recognition. The HDL

description has the advantages of being generic, flexible,

dynamic reconfiguration on user demand and useful to gain

more control of parallel processes. Efficient reusability and

performance is derived by providing the characteristics of

entities into a model library (Izeboudjen, Farah, Bessalah,Bouridane & Chikhi, 2008). Table 1 shows the comparison of

 VHDL with the procedural languages and outlines the

advantages of characterizing digital hardware using

hardware descript ion language based on ent it y

 validation. The computations involved mostly the fixed

point integer rather than floating point which results in

some false outputs. This can be fixed byintroducing libraries

defining the float type variables and vectors. Pattern

recognition using the neural networks is dealt recently in

(Vargas, Barba, Torres & Mottos, 2011). The values of pixels

of an image frame were used as inputs for recognition

thus causing increas ed memor y us age and

computations. We choose to perform the recognition on

thebitmapped (depth 4) image rather than thegrayscale

(8 bits) image. Gain in bandwidth is achieved in terms of

memory storage. Also, once the image is preprocessed,

the features are extracted and used as inputs in this

architectureproposed. In thispaper, theauthorspresent the

new generic design of Combinational Neural Network

(CNN)proposed inearlier research (Mekala, Erdogan& Fan,

2010) for pattern recognition based on Xilinx Spartan 3E

board using VHDL model called HDL-CNN. The simulation of

 VHDL models are facilitated by the use of stimulus

sequences and checkers (e.g., VHDL test benches, mean

square error). The training time and computations

 variations (which depend on global parameters defined

by theuser) areanalyzed anddisplayed in later sectionsof

this paper. Comparison is made in order to establish the

performance in speed of themodelproposed.

The rest of the paper is as follows: section 2 presents the

design progression using HDLand FPGA logistics, section 3

explains the Combinational Neural Networks (CNN) and

HDL-CNN, section 4 explains the Sign/Gesture recognition

model, sect ion 5 presents the result, and, sect ion 6

concludes thepaper.

1.Design progression using HDL

The first question that comes tomind is: Why use a high level

design methodology (such as HDL) for CNN implementationas opposed to other object–oriented simulations. The

answer would be that high speed processing can be

achieved through dedicated hardware working in parallel

 which can be implemented on FPGAs using HDL. ANNs are

powerful systems capable ofmodeling the complex input-

output relationships. Information is processed via the

mathematicalmodels usingthe interconnections of neurons.

Some interesting features displayed by the network engine Table 1. VHDL vs. Procedural languages

7l

i-manager’s Journal o Electronics Engineering, Vol. No. 1l

 n 2 September - November 2011

 VHDL provide ways to descr ibe

propagat ion of time and signal

dependencies. Hardware oriented –

Digital logic design (The operations

and structure are described in gate

leveland RTlevel – hierarchaldesign).

 VHDL suppor ts unsynthe si zable

constructs that are useful in writinghigh-level models, test benches and

other non-hardware artifacts needed

in hardwarelogic design.

 VHDL has static type checking-many

errors can be caught before synthesis

and/or simulation.

 VHDL has a rich collection of data

types and wel l-defined s tandard

 with a full-featured language and

module s ys tem ( li br ar ie s a nd

packages).

No way to describe time and signal

dependency. Software or iented-

B inar y e xecu table ( Da ta f low

languageand non-hierarchaldesign)

Explicit constructs and assignments

are not supported by the procedurallanguages.

Errors can be analyzed only after

debugging. Synthesis errors are hard

to debug.

Object oriented programs are written

 withpure logical or algorithmic thinking.

Inherently procedural (single-threaded),

 with limited syntactical and semantic

supportto handleconcurrency.

 VHDL (Hardware descriptive)

 VHDL contains components that areconcurrent i.e. run in parallel/ simultaneously.

Procedural languages (C,C++, MATLAB)

Traditional software languages likeC,C++,andMATLABare sequential.

8/2/2019 Project2MTECH

http://slidepdf.com/reader/full/project2mtech 3/11

RESEARCH PAPERS

linear functionof multiplyand accumulate as follows:

(1)

The synaptic weight of the connection is given by w ;ki

 where 'p' is the number of incoming inputs to the neuron.

The output of the model is given by y   g iven by the pre-k 

output passed through the activation function φ(.)

(Sigmoid functiondefinedin Eq. 3)as shown below:

jn (2)

 2.2 HDL- CNN architecture Model 

The CNN is built on the basic network of a back

propagation described in previous research (Mekala,Erdogan & Fan, 2010). The features extracted from the

prior module block are d ivided into classes or stages

 where a set of features describe some information on the

probability of the decision of the recognition. Hence from

a set of  M  actual features to be extracted, the

probabilisticdecision ismade on three levelswith first level

monitoring the other two levels. Each level is fed with a

 vector V  of variable length. Hence there exists vector V of

size L1, L2 and L3. Since the platform is being designed to

serve as a generic model and flexible to user demands,the value of  M  varies from application to application

depending on the linearity of theoutput classes. A parallel

communication bus is provided between the feature

extraction layer and the three level CNN recognit ion

model inorderto allow the flowof the three level sets of the

 vector data as well as an initialization clock signal to

choose either level2 or level3 once the decision on level1

is made (Mekala, Erdogan & Fan, 2010). The t ime and

 y = ( )k k 

connections, concurrent operations, propagation delay

and timing information (Omondi & Rajapakse, 2006)

(Berry, 2002)(Schemmel,Meier& Schurmann,2001)(Short,

2009)(Ashenden, 1995).

2. Hardware Descriptive Language - Combinational

NeuralNetworks (HDL-CNN)

The CNN is a special class of ANN being described as

follows. Thedesign resembles the tree structure in addition

to the generic architecture of a neural network. The

previous research on the software solution CNN design as

proposed in (Mekala, Erdogan & Fan, 2010) is based on

the address search of the virtual memory of a CPU. This

paper examinesan alternative implementation of theCNN

on the hardware platform called as the HDL-CNN whichmodifies the architecture with the help of VHDL design on a

FPGA to better the performance. A basic neural network

engine and the extension of back-propagation network on

to theHDL-CNNmodelaredescribedbelow.

 2.1 Generic Neuron Model in HDL

In order to model an artificial neuron from a biological

neuron, three basic components are used - input to the

neurons, synaptic weights and activation threshold

function. The synapses of the biological neuron (i.e. the

onewhich interconnects the neural network and gives the

strength of the connection) are modeled as synaptic

 weights. Mathematically they can be considered as

functions- two l inear and one non-l inear. All inputs are

modified by the weights and summed altogether. This

activity is referred as a l inear combination. The l inear

combination of the input stage and aggregation is being

modeled as a simple MAC (multiply and accumulate)

function. The output of the MAC is passed through a non-

linear activation threshold to determine the output. The

activation function considered could be – step function

(simplest non-linear function), ramp function or a sigmoid

function (Mehrotra, Chilukuri & Ranka, 1997)(Caudill &

Butler, 1992)(Stergiou & Siganos, 1996)(Dreyfus, 2005).

The Figure 1 shows the neural network engine with the

three layer structure (Fausett, 1994). Each neuron receives

several inputs i.e.  x  and generates pre-output v  (k i k 

representing the neuron generating output) through the

Figure 1. Generic Neural Network Model

8 i-manager’s Journal o Electronics Engineering, Vol. nll

 2 No. 1 September - November 2011

 x0

 x1

 x2

wk 0

wk 1

wk 2

wk = bk (bias)0

wk  p x p

Inputsignals

Synaptic Weights

SummingJunction

Svk 

Ouput yk 

qk 

Threshold

 ActivationFunction

j·( )

LINEAR FUNCTIONSMAC LAYER

NON-LINEAR FUNCTION

8/2/2019 Project2MTECH

http://slidepdf.com/reader/full/project2mtech 4/11

RESEARCH PAPERS

change to influence the weight change and α is the

learning rate adopted for the training. The mathematical

equations are given in Table 2 for the design of each level

ofCNNadopted(Fausett, 1994).

Generally theerror threshold adjustment and learning rate

(generally between 0.5 and 0) variations adds little to the

process; so the idea of momentum is used to boost the

performance. On each pass through the layers, the

 weight change of amatrix of synapses is influenced by the

previous pass's weight change. The degree to which it is

influenced is determined by the momentum term

(generally varies between 0 and 1). The weight

adjustments are made as epoch based trainingwhere at

the end of each epoch the cumulative error is alsot racked. All the factors such as s ize of the three level

 vectors L1, L2, L3, each BP input, hidden andoutput layers,

learning rate parameters are user defined and are being

set in a configuration file. Figure 2 shows the block

diagram of the HDL- CNN model. In order to generate the

memory consumption involved in computing the feature

 vector depends on the length of it. Thus instead of deriving

all the values of the vector at a t ime, three levels are

involved in order to improve the speed and performance

of theHDL-CNNmodel.

The activation function used in the modeling of the CNN

architecture is the sigmoid function. Each node in the

network receives severalinput values andcombines them

to produce an output value. The node's activation

functiondetermines themanner in which these values are

combined. It is necessary for the activation function to

combine the input values in a non-linear manner so as to

fit forwider range of taskapplications. Since each stage of

CNN is constructed using back-propagation network (three layers- input, hidden and output layers) it is

important that the activation function used needs to be

continuously differentiable. There exists several functions

 which meet this criteria but the most commonly used

activation function is the sigmoid function as described

by the Equation3 below (Kwan, 1992).

(3)

It is not easy to represent sigmoid function indigital design

s ince i t contains the exponential ser ies. In the object

oriented programmingbased models it is definedwith thehelp of a look up table consuming more memory

resources.In theHDL-CNNdesign piecewise second order

approximation of the function using quadratic functions is

implemented shown in Equation 3 where c , c and c are0 1 2

coefficients of the quadratic function (Tommiska, 2003).

This requires two adders and three multipl ier operators

redesigned as two MAC operations and a shift register for

calculating thesquare.

Each level vector based back propagation module is

evaluated as described below. Assuming H  is the vector of

hidden-layer neurons, I is the vector of input-layer neurons

andW1 is the weight matrix between the inputand hidden

layer, W2 is thematrix of synapses connection hidden and

output layers, th1 and th2 a re the effect b iases on the

computed activations (set to value 1 for this design), T  is

the target activation of the output layer, μ is the

momentum factor used to a llow the previous weight

Table 2. Equations governing each level of

CNN (Back propagation neural network)

Hidden layer neuron activations

Output-layer neuron activations

Output-layer error

Hidden-layer error

 Weights for second layer of synapses

 Weight adjustment of first layer of synapses

 H =j( I.W 1+th1)

0 = ( H.W 2+th2)j

D = 0(1-0)(0-T)

E = H(1-H)W2.D

W 2= W 2+DW 2 DW 2 =m HD+DW 2t-1 where

W 1= W 1+W 1 W1 =a IE+mDW 1t t t-1 where

Figure 2. HDL- CNN recognition model – Block diagram

9l

i-manager’s Journal o Electronics Engineering, Vol. No. 1l

 n 2 September - November 2011

8/2/2019 Project2MTECH

http://slidepdf.com/reader/full/project2mtech 5/11

RESEARCH PAPERS

3. Sign/Gesture RecognitionModel

 A recognition model is shown below in Figure 3. Sign

recognition using neural networks is based on the learning

of the network using a database set of s igns/ gestures

(Vargas, Barba, Torres & Mottos, 2011). The architecture is

des igned based on camera -based r ecogn it ion

methodology. Once the video/ image are being

obtained from the acquisition unit, the image (256x256

pixels) is processed in various stages anddata is extracted

to implement the recognition model. The first step of the

model after the image data acquired is pre-processing.

In general raw image data processing consumes high

memory and other resources due to redundancy in the

spatial and temporal basis. Pre-processing involves

filtering and background subtraction in order to consider

 var ious environment factors such as illumination,

unwanted noise and other scene conditions done using

MATLAB. These pre-processed frames are taken as input

(bitmapped images) into the FPGA for LoG (Laplacian of

random weights a l inear shift register module is used

(weights between -1 and 1 are generated). The

asynchronous RESET when set to high, the internal finite

statemachine of CNN is reset to the initial state. During the

initialization phase, the CNN randomizes all connection

 weights using the shift register module and when

completed it enters the idle state. The training and testing

is done in two different modes called TRAIN and TEST.

 When the mode is set to TRAIN - the CNN enters the training

state from idle and during TEST- CNN enters the run state

 with the corresponding flags being set.

2.2.1ComputationalAnalysis

The number of computations involved and the gain

acquired by shifting the architecture to HDL platform isbeing modeled below. With the feature vector V of

 variable length is a function of number of patterns ' p'

considered for recognition and the number of features

extracted ' n'.

(4)

General consideration of the CNN level is n input neurons,

 h hidden neurons and l  output neurons where n < h < 2n-

1; MAC representsmultiplyandaccumulate, A is adder, M 

is multiplier and S is a shi fter operation. For a s igmoid

operation the software solution using a look up table ( LUT )

 where the time taken for performing one LUT  depends on

the speed of the processor. In the HDL-CNN the quadratic

equation depicted uses one MAC and one Shifter  (MAC

+S) for the calculation of s ingle neuron act ivat ion

function.

In the HDL-CNN the matr ix operations involved in the

 weight layer updates and error calculations are

performed by the dedicated adders, multipliers, shifters

and MAC units and hence concurrently (in one complete

clock cycle) done at a time for all neurons rather than the

for loop control used in software models. Each level of the

CNN has the different number of computations involved

listed in Table 3 where the values of n,  h, and l   (the input,

hidden and output layer neurons) vary from level1, level2

and level3. The average gain in training time is plotted

and discussed in the results section supports the above

analysis.

Table 3. Comparison of Number of computation involved

in each level between the MATLAB CNN and HDL-CNN

CNN HDL-CNN

Hidden layerneuron activations

Outputactivations

-layer neuron

Output-layer error

Hidden-layer error

 Weight adjustmentof second layer

 Weight adjustmentof first layer

{nhM  + (1+h(n-1)) A} + h( LUT )

{hlM  + (1+l (h-1)) A} + l ( LUT )

2lA + 2lM 

(h( -1)) A + ( ) M l lh + 2h

(2h ) A + ( ) M l  2lh

(nh) A + ( ) M 3nh

{ } + ( ) MAC h MAC + S h

{ MAC }l  + ( MAC + S )l 

( 1) MAC l +

h( ) MAC + M 

hMAC  2 A+

nMAC  + 2( ) A + M 

Figure 3. System Overview of the sign recognition model

10 i-manager’s Journal o Electronics Engineering, Vol. nll

 2 No. 1 September - November 2011

8/2/2019 Project2MTECH

http://slidepdf.com/reader/full/project2mtech 6/11

RESEARCH PAPERS

components in use in this case study realization of CNN

using HDL for sign language recognition. Figure 5 shows

the connections between the FPGA board – Xilinx Spartan

Gaussian) edge detection. The feature extraction block

and CNN block perform in parallel with dual bus

communication i nter face p rovided. The feature

extraction layer extracts the necessary features of size,

shape and state attributes of thehand (described in detail

(Mekala, Gao, Fan & Davari, 2011)). Since it is time

consuming forthe processor towaituntil allthe 55 features

have been extracted, the CNN layer i s ini tiated at the

arrival of first 15 feature elements to Level1 and then the

40 for the next level 2 or level 3 adopted based on the

decision of level 1 network (Mekala, Gao, Fan & Davari,

2011). The trainingof CNN is done using the sign language

patterns from A to Z (without J and Z characters involving

the motion). Inorder to test the ability and performanceof

the network, usually a test set of independent examples is

used in order to generalize the network with regards to

examplesets which arenot present in thetraining set.

The case study of American Sign Language (ASL)

recognition is being interpreted stepby step as follows:

·Image acquisition via camera and generating still

image frame data (Video to Frames conversion with

background subtraction) – done using MATLAB and

stored as “.coe” files.

·Transfer of the image data to a Xilinx Spartan 3E FPGA(Field Programmable Gate Array) board via USB 2.0

using a PC.

·Saving the data to the onboard SRAM (Static Random

 Access Memory) to allow image processing functions

tobeperformedon theimage.

·Implementing the edge detection and feature

extraction algorithms on the image and storing the

feature vectorback in theSRAM.

·Recognition via CNN model with parallel interaction

to thefeatureextraction unit.

·Display of the input frame and processed frame on to

the PC to be viewed bythe user via the VGA controller,

recognized sign displayed on theLCD segment.

The model schematic of the sign recognition is shown in

Figure 4. I t contains the SRAM module, preprocessor

module, feature extraction module and the CNN

recognit ion module. There are three main hardware

Figure 4. System Overview of the sign recognition model–

Preprocessing, Feature Extraction and CNN Engine.

Figure 5. Xilinx Spartan 3E kit Hardware connections

11l

i-manager’s Journal o Electronics Engineering, Vol. No. 1l

 n 2 September - November 2011

8/2/2019 Project2MTECH

http://slidepdf.com/reader/full/project2mtech 7/11

RESEARCH PAPERS

parallel procedures hence the gain in speed. On an

average (mean over different number of patterns) the

time consumed for the same design on MATLAB was

around 381.36 seconds while the time taken for the

design on VHDL is CPU time 29.34 seconds and real time

30 seconds as listed in Table 4. The HDL solution proposed

proved to be 13x times better in consideration with time

andspeed of the trainingof network.

The images after background subtraction are converted

from grayscale to bitmap of depth 4 (sent as input in the

form of '.coe' files to the ISE). Grayscale involves 8 bits to

represent each pixel where as the bitmap image ofdepth

4. Thus processing of 256x256 image saves 262144

(256x256x4) bits in representation that is 256Kb inbandwidth listed in Table4.

To validate the per formance of the HDL-CNN, they

generate the mean square error test-bench considering

the actual situation of the neural network operation. The

test-bench adopting the three-level feature vector as

input signal vectors, and the weight coefficient of hidden

and output layers are stored in RAMs. Both mean square

error and evolution of weights are transferred to text files

and plotted using MATLAB. Epoch based updating of the

 weights is performed and Mean Square Error is decreasingat an exponential rate and settl ing down to an almost

constant value shown in Figure7.

 An epoch is the presentation of the entire training set to

the neural network once and for the network to reach the

minimum threshold error the training is done multiple

t imes counted as number of epochs. Maximal weight

change in each epoch is decreasing and finally reaching

to a least valuepossible. The best, intermediate and worst

case scenarios are shown in Figure 7 where the evolution

of weights issettling down to a constantvalueattheendof1815 epochs for the best case obtained by the global

3E, the USB to Peripheral communications module and a

monitor with VGA connect ion in order to display the

recognized output sign. The authors adopt Xilinx

Integrated Software Environment ( ISE 10.1) which is a

powerful, f lexible integrated design environment that

allows designing Xilinx FPGA devices from basic modules

to complete microprocessor architectures. Project

Navigator is the user interface that manages the entire

design p rocess including design entry, simulation,

synthesis, implementation, and finally download the

configuration of the FPGA device. PACE is responsible for

placing and routing the code for optimization. IMPACT

then generates theprogramming files anddownloads the

code to hardware(Xilinx, 2009).

4.Results

Most components of the architecture perform in parallel

and hence the potentially infinite t raining t imes are

reduced reasonably. The training time is the time taken to

train the network for a given number of patterns ' p' without

duplicates input frames. The training time is being plotted

as varied with respect tonumber ofpatterns beingtrained

in Figure 6. It clearly shows that the HDL-CNNmodel saves

atan average13x timesthe time involved intraining when

compared to the software based CNN model. Also thecurve states that the time saved increases exponentially

as the number of patterns increases i.e. as complexity of

recognition is more non-l inear and thus an average is

considered for comparison. The adjustments of the

 weight matrices and neuron activation vector are all

Figure 6. HDL-CNN vs. CNN architecture Training time (in seconds)

variation based on number of patterns to be trained

3 4 5 6 7 8 9 10 110

100

200

300

400

500

600Trainingtime vs.No. of Patterns

     T    r    a     i    n     i    n    g     T     i    m

    e

Number of Patterns to be trained

CNN Traintime

HDL CNN Train time

9x times

13x times

15xtimes

Table 4. HDL-CNN recognition model vs. CNN recognition

model (Software vs. Hardware architecture)

HDL-CNN (Hardwaresolution) CNN (MATLAB solution)

 Average Training Time 29.34secs 381.36secsSingle Pattern Recognition Time 43.45ms 0.52secs Average Performance 95% 92.8% Average Noise Immunity  51% 48%Epochs (Best case scenario) 1815 1832Limitations J,Z (signs with motion) -

Gain in bandwidth 256Kb per frame -

12 i-manager’s Journal o Electronics Engineering, Vol. nll

 2 No. 1 September - November 2011

8/2/2019 Project2MTECH

http://slidepdf.com/reader/full/project2mtech 8/11

Best case scenario–11

Hidden layer neurons,

learning rate 0.1,

momentum 0.4 and

threshold 0.0001.

Intermediate case

scenario-Hidden layer

units=11; learning

rate=0.1, momentum=0.7;

threshold=0.0001

 Worst case scenario-

Hidden layer units=11;

learning rate=0.01,No

momentum;

threshold=0.001

RESEARCH PAPERS

strangepatterns.

Figure 8 generalizes the results of few test patterns with the

LoG edge operator and the sign recognized. Few noisy

patterns are also tested in order to evaluate the accuracy

of the architecture. Though the network is trained using

different test patterns, it appears that the noise immunity

levels are varying foreach sign involved. Noise immunity is

the level of noise under which the pattern can sti ll be

recognized accurately. Thecorrelation between the signs

plays a role factor for the inconsistency of the noise

immunity seen. The performance is calculated as the

ratio of correct patterns recognized to the total number of

test patterns. On an average performance of 95% is

achieved with the pattern identification and takes around

43.45ms to retrieve one image pattern. Given an input

frame for testing the time taken by the network

architecture to process and recognize the s ign is the

parameters (lea rning rate, momentum and error

threshold) optimization. Inclusion of the momentum is

proved tobe useful with the training sets that include a few

patterns that are very different than the rest (as in patterns

B, W, Y are completely different from the patterns A, C, O

 where the finger tips are not present) demonstrated in the

 worst case where there is no momentum compared to

thebestand intermediatecases.Normally, these patterns

 will upset the convergence towards the minimum defined

by the majority of the patterns. To improve that, one could

use a very small learning rate (<0.1), but then the

convergence would be very slow. Instead the study keep

a moderate learning rate (0.1) but the authors will involve

the previous weight change, in addition to the current

data (weight change), for defining the weight upgrade.

This wil l provide certain inertia to the training, which wil l

minimize the disruption of the convergence caused by

Figure 7. Various simulations for Best, Intermediate and Worst case considerations

for the each level of HDL-CNN training acquired (15 input neurons)

13l

i-manager’s Journal o Electronics Engineering, Vol. No. 1l

 n 2 September - November 2011

8/2/2019 Project2MTECH

http://slidepdf.com/reader/full/project2mtech 9/11

RESEARCH PAPERS

updates and hence concurrently done at a t ime for all

neurons rather than the for loop control used in object

oriented languages. Arithmetic precision is also achieveddue to the use of f loating point l ibraries. Moving to this

parallel hardware provided the speedups in orders of

magnitude (13x t imes in this case). Many advanced

famil ies of FPGAs have been manufactured (Vargas,

Barba, Torres & Mattos, 2011) that contain more logic

blocks and also video input control lers which clearly

implies the design to be optimized on different goals of

area, power and speed. The use of VHDL for the

architectural design represents a very practical option

 when dealing with complex systems. Thus the FPGAsconstitute a very powerful option for implementing CNNs

s ince we can really exploit their parallel processing

capabilities to improve the performance. To progress the

research the algorithm needs to be extended to

recognize words or sentences which involve a set of

images (i.e. video frames) to be processed at a time with

thehelp of a vectorbank. Also theHDL-CNN architectureis

generic as it could be used for other pattern recognition

,

single pattern recognition time. The signs involvingmotion

(J, Z)arethe limitations of thearchitecture as compared to

the software solution. The epochs involved to reach thesteady s tate and the noise immunity achieved are

approximately equal in both cases. An inclusion of a SRAM

 vector back to store the motion vectors of adjacent

frames could be considered for future research in order to

eliminate theabove limitations.

Conclusion

The combinational neural networks are one of the most

powerful tools in the recognition/ identification process

applications. The VHDL based model design of the sign

recognition model presents a performance pretty good

to identify the static images of the American Sign

Language alphabets with implementation on the Xilinx

Spartan 3E FPGA. Per formance is achieved as the

expensive operations are optimized in VHDL by the use of

a matrix-vector multiplication performed during each

layer and level data flow. Dedicated adder and

multipliers are used for per forming the weight layer

Image Processed

of Gaussian) Edge detection

-LoG (Laplacian Sign

Recognized

 Alphabet Noisy Image Processed-LoG Edge detection

Sign Alphabet

Recognized

B Y

I V

C L

Figure 8. Sign language alphabets recognized by the HDL-CNN recognition model

14 i-manager’s Journal o Electronics Engineering, Vol. nll

 2 No. 1 September - November 2011

8/2/2019 Project2MTECH

http://slidepdf.com/reader/full/project2mtech 10/11

RESEARCH PAPERS

October). A VLSI Implementation of an Analog Neural

Network suitable for Genet ic Algorithms.  ICES '01

 Proceedings of the 4th International Conference on

 Evolvable Systems: From Biology to Hardware. Springer-

 Verlag London, 50-61.

[11]. Short, Kenneth L., (2009). VHDL for Engineer . NJ:

Pearson Prentice Hall.

[12]. Ashenden, Peter J., (1995). The designer's guide to

 VHDL. San Francisco: Morgan Kaufmann publishers.

[13]. Mekala, P., Erdogan, S. and Fan, J., (2010,

November). Automatic object recognition using

combinational neural networks in surveillance networks.

 IEEE 3rd International Conference on Computer and

 Electrical Engineering (ICCEE'10), Chengdu, China, Vol. 8,pp. 387-391.

[14]. Mehrotra, K., Chilukuri, K.M., and Ranka, S., (1997).

 Elements of Artif icial Neural Networks, The MIT Press, pp1-

2.

[15]. Caudill, M., Butler, C., (1992). Understanding neural 

 networks: Computer explorat ions, MIT press.

 [16]. Stergiou, C., and Siganos, D., (1996). Report: Neural

 N e t w o r k s . V o l 1 4 . R e t r i e v e d f r o m

http://www.doc.ic.ac.uk/~nd/surprise_96/journal/vol4/cs

11/report.html.

[17]. Dreyfus, G., (2005).  Neural networks: methodology

and applications. Berlin, New York: Springer.

[18]. Kwan, H.K., (1992, July). Simple s igmoid l ike

acti vation funct ion su itable for digi ta l hardware

implementation. Electronic Letters, 28(15), 1379-1380.

doi: 10.1049/EL: 19920877.

[19]. Fausett, L.,(1994). Fundamentals of Neural Networks

– architecture, algorithms andapplications. Prentice Hall.

[20]. Mekala, P., Gao, Y., Fan, J., and Davari, A., (2011,

March). Real-time sign language recognition based on

neural network architecture.  Joint IEEE International

Conf erence on Indu str ial Tech nol ogy & 43rd

Southeastern Symposium on System Theory (SSST'11),

 Auburn, AL, pp. 197-201.

[21]. Xilinx (2009). XST User Guide, Xilinx Inc. Retrieved from

http://www.xilinx.com/support/documentation/sw_manu

(like objects, face) provided the training sequences have

tobe varied.

References

[1]. Torres-Huitzil, C., Girau, B., and Gauffriau, A., (2007).Har dwar e/ So ftwa re Co-de si gn f or Embedded

Implementation of Neural Networks.  Reconfigurable

computing: architectures, tools and applications-

 Lecture notes in computer science, 4419, 167-178.

[2]. Cantrel l, C., and Wurtz, L., (1993). A Parallel Bus

 Architecture for artificial neural networks.Southeastcon'93

 P r o c e e d i n g s , I E E E ( p p . 5 ) . d o i :

10.1109/SECON.1993.465674.

[3]. Baker, T., and Hammerstrom, D., (1989).

Characterization of Artificial Neural Network Algorithms.

Circuits and Systems- IEEE International Symposium, vol.1,

78-81.doi: 10.1109/ISCAS.1989.100296.

[4]. Blais, A., and Mertz,D., (2001, July). An Introduction to

 Neural Networks – Pat tern Learning wi th Back

 P r o p a g a t i o n A l g o r i t hm . R e t r i e ve d f r o m

http://www.ibm.com/developerworks/library/l-neural/.

[5]. Vargas, P. Lorena, Barba, L., Torres, C. O., andMattos,

L., (2011). Sign Language Recognit ion System using

Neural Network for Digital Hardware Implementation.

 Journal of Physics: Conference Series, 274(1). doi:

1088/1742-6596/374/1/012051.

[6]. Ali, H. K., and Mohammed, E. Z., (2010, August).

Design Artificial Neural Network using FPGA.  International

 journal of computer science and network security , 10(8),

88-92.

[7]. Omondi, R. Amos, and Rajapakse, Jagath C., (2006,

July).  FPGA Implementations of Neural Networks .

Springer.

[8]. Izeboudjen,N., Farah, A., Bessalah,H., Bouridane, A.,and Chikhi, N., (2008, July). High Level Design Approach

for FPGA Implementation of ANNs.  Encyclopedia of

 Ar ti ficial In tell igence, IGI-Global Publ ishers . doi:

10.4018/978-1-599-4-849-9.

[9]. Berry, D. L., (2002). VHDL programming by examples.

McGraw-Hill, fourth edition.

[10]. Schemmel, J., Meier, K. and Schurmann, F., (2001,

15l

i-manager’s Journal o Electronics Engineering, Vol. No. 1l

 n 2 September - November 2011

8/2/2019 Project2MTECH

http://slidepdf.com/reader/full/project2mtech 11/11

RESEARCH PAPERS

reprogrammable logic.  IEEE proceedings, Computer

 Digi ta l Techn iques, 150 (6 ). doi : 10 .1049/ ip -cdt :

20030965.

als/xilinx12_2/xst.pdf.

[22]. Tommiska, M.T., (2003, November). Efficient digital

implemen ta tion of t he s igmoi d f un ct ion for

 PriyankaMekala received her M.S. degree in Electrical Engineering from Arizona State University and B.E. degree in Electronics

and Communications fromOsmania University, India inMay 2009and June 2007,respectively. She started towork on her Ph.D.

degree in Electrical Engineering at FIU in fall 2009. She is currently a Ph.D. candidate. Her research interests include Signal

 Processing,Real-time Image/ Video processingand VLSI design/ testing. She is also a studentmember of IEEE.

 Dr. Fan is currently working as an Assistant Professor in Electrical and Computer Engineering at Florida International University. His

 research interests include very-large-scaled-integrated (VLSI) circuit simulation, modeling, optimization, bio-electronics,

embedded real-time operatingsystemsin application to roboticcontrol,and wirelesscommunications in sensor networks. Prior

to his academic career, He served as Vice President of Vivavr Technology, Inc., and General Manager/co-founder of Musica

Technologies, Inc. From 1988 to 2002, he held various senior technical positions in California at Western Digital, Emulex

Corporation, Adaptec Inc., and Toshiba America. Hisproduct lineof research anddevelopment includes Virtual Reality (VR) 3-D

animation, MP3 players, hard drives, fiber channel adapters, SCSI/ATAPI adapters, RAID disk array, PCMCIA cards and laser printer controllers. He received his Ph.D. degree in Electrical Engineering at University of California, Riverside in 2007, and the

 Master of Science degree in Electrical Engineering from State University of New Yorkat Buffalo in 1987. He also holds Bachelor of

Science degree in Electronics Engineering from National Chiao Tung University in Taiwan, R.O.C. He has served as a steering

committee member of SSST, a technical program committee member for ICESS, CAMAD, ISQED, ISCAS, and an invited tutorial

speaker for ASICON'07. He is a SeniorMemberof IEEE.

 ABOUT THE AUTHORS

16 i-manager’s Journal o Electronics Engineering, Vol. nll

 2 No. 1 September - November 2011