13
Analog Integrated Circuits and Signal Processing, 15, 263–275 (1998) c 1998 Kluwer Academic Publishers, Boston. Manufactured in The Netherlands. Focal-Plane and Multiple Chip VLSI Approaches to CNNs M. ANGUITA, F. J. PELAYO, E. ROS, D. PALOMAR ANDA. PRIETO Departamento de Electr ´ onica y Tecnolog´ ıa de Computadores, Facultad de Ciencias, Universidad de Granada, 18071-Granada, Spain [email protected] Received October 1, 1996; Accepted November 26, 1996 Abstract. In this paper, three alternative VLSI analog implementations of CNNs are described, which have been devised to perform image processing and vision tasks: a programmable low-power CNN with embedded photo- sensors, a compact fixed-template CNN based on unipolar current-mode signals, and basic CMOS circuits to implement an extended CNN model using spikes. The first two VLSI approaches are intended for focal-plane image processing applications. The third one allows, since its dynamics is defined by process-independent local ratios and its input/outputs can be efficiently multiplexed in time, the construction of very large multiple chip CNNs for more complex vision tasks. 1. Introduction The CNN model introduced by L. Chua and L. Yang [1a, 1b] has been widely studied due to its interesting features in performing principally image processing tasks. A CNN as basically an array of locally inter- connected analog processing elements, or cells, oper- ating in parallel, whose dynamic behaviour is deter- mined by the cell connectivity pattern (neighbourhood extent) and a set of configurable parameters. The time evolution of the state of a cell c in a N×M-cell CNN is described by the differential equation: τ dx c (t ) dt =-x c (t ) + g c (t ) =-x c (t ) + X n A n-c y n (t ) + X n B n-c u n + I 1 n, c N . M ; n N R (c);|x n (0)|≤ 1;|u n |≤ 1; where n denotes a generic cell belonging to the neigh- bourhood of cell c, N R (c), with radius equal to R. N 1 (c) is the set of 3 × 3 cells centred in c ( N 1 (c) = {c - N - 1, c - N , c - N + 1, c - 1, c, c + 1, c + N - 1, c + N , c + N + 1)}, N 2 (c) the set of 5 × 5 cells centred in c, and so on. x c is the state of cell c, y n is the output from each cell n, defined in terms of the nonlinear function: y n = f (x n ) = 1 2 (|x n + 1|-|x n - 1|) y n [-1, 1] u n is the input to the cell n, I is an offset term, and the matrices A and B are called feedback and control templates respectively. Depending on the values of the cloning template components, the offset term, and the initial states, the resulting CNN is configured or “programmed” to perform a given processing task on the inputs. Since the publication of the two papers of Chua and Yang, a number of VLSI approaches have been pro- posed to approximate the CNN model as well as other network models inspired by the original one. Various working chips have also been reported implementing CNNs with either fixed or programmable templates. On looking at the published designs and experimen- tal results [2–12] (considering primarily analog ones), we can find a wide variety of circuit approaches for CNN cells, depending on the basic circuit techniques used and the way the signals are processed in time (con- tinuous or discrete). Most of the experimental results reported to date refer to the processing of optical in- formation, although in practice, only a few CNN chips integrate on-chip photo-sensors (focal-plane solutions [7, 11, 12]). 39

Focal-Plane and Multiple Chip VLSI Approaches to CNNshera.ugr.es/doi/15014022.pdf · [11]. The VLSI implementations described in [3, 5, 7] can also be identified as particular cases

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Focal-Plane and Multiple Chip VLSI Approaches to CNNshera.ugr.es/doi/15014022.pdf · [11]. The VLSI implementations described in [3, 5, 7] can also be identified as particular cases

Analog Integrated Circuits and Signal Processing, 15, 263–275 (1998)c© 1998 Kluwer Academic Publishers, Boston. Manufactured in The Netherlands.

Focal-Plane and Multiple Chip VLSI Approaches to CNNs

M. ANGUITA, F. J. PELAYO, E. ROS, D. PALOMAR AND A. PRIETO

Departamento de Electronica y Tecnologıa de Computadores, Facultad de Ciencias, Universidad de Granada, 18071-Granada, Spain

[email protected]

Received October 1, 1996; Accepted November 26, 1996

Abstract. In this paper, three alternative VLSI analog implementations of CNNs are described, which have beendevised to perform image processing and vision tasks: a programmable low-power CNN with embedded photo-sensors, a compact fixed-template CNN based on unipolar current-mode signals, and basic CMOS circuits toimplement an extended CNN model using spikes. The first two VLSI approaches are intended for focal-planeimage processing applications. The third one allows, since its dynamics is defined by process-independent localratios and its input/outputs can be efficiently multiplexed in time, the construction of very large multiple chip CNNsfor more complex vision tasks.

1. Introduction

TheCNN model introduced by L. Chua and L. Yang[1a, 1b] has been widely studied due to its interestingfeatures in performing principally image processingtasks. ACNN as basically an array of locally inter-connected analog processing elements, or cells, oper-ating in parallel, whose dynamic behaviour is deter-mined by the cell connectivity pattern (neighbourhoodextent) and a set of configurable parameters. The timeevolution of the state of a cellc in a N×M-cell CNN isdescribed by the differential equation:

τdxc(t)

dt= −xc(t)+ gc(t)

= −xc(t)+∑

n

An−cyn(t)

+∑

n

Bn−cun + I

1≤ n, c ≤ N.M; n ∈ NR(c); |xn(0)| ≤ 1; |un| ≤ 1;

wheren denotes a generic cell belonging to the neigh-bourhood of cellc, NR(c), with radius equal toR.N1(c) is the set of 3× 3 cells centred inc (N1(c) ={c − N − 1, c − N, c − N + 1, c − 1, c, c + 1, c +N−1, c+ N, c+ N+1)}, N2(c) the set of 5×5 cellscentred inc, and so on.xc is the state of cellc, yn

is the output from each celln, defined in terms of thenonlinear function:

yn = f (xn) = 1

2(|xn + 1| − |xn − 1|) yn ∈ [−1, 1]

un is the input to the celln, I is an offset term, andthe matricesA andB are calledfeedbackandcontroltemplates respectively. Depending on the values ofthe cloning template components, the offset term, andthe initial states, the resultingCNN is configured or“programmed” to perform a given processing task onthe inputs.

Since the publication of the two papers of Chua andYang, a number of VLSI approaches have been pro-posed to approximate the CNN model as well as othernetwork models inspired by the original one. Variousworking chips have also been reported implementingCNNs with either fixed or programmable templates.

On looking at the published designs and experimen-tal results [2–12] (considering primarily analog ones),we can find a wide variety of circuit approaches forCNN cells, depending on the basic circuit techniquesused and the way the signals are processed in time (con-tinuous or discrete). Most of the experimental resultsreported to date refer to the processing of optical in-formation, although in practice, only a few CNN chipsintegrate on-chip photo-sensors (focal-plane solutions[7, 11, 12]).

39

Page 2: Focal-Plane and Multiple Chip VLSI Approaches to CNNshera.ugr.es/doi/15014022.pdf · [11]. The VLSI implementations described in [3, 5, 7] can also be identified as particular cases

264 M. Anguita et al.

The experimental density of programmable CNNchips with embedded photo-sensors ranges betweena few and tens of cells per square millimeter, whilethe chips designed for specific image processing tasksmay reach densities of hundreds of cells per squaremillimeter (if they carry out relatively simple tasks).These densities allow the integration, on a typical 1 cm× 1 cm chip, of CNNs to process images from about athousand to a few tens of thousands of pixels, depend-ing on the programming features of the networks. Ifthe CNN circuit does not include the photo-sensors themajor bottleneck may lie in the way the inputs have tobe supplied to the network, which greatly affects theoverall speed of the CNN. This probably reduces inter-est in the CNN implementation itself, since its parallelprocessing capabilities cannot be fully exploited, andother alternative sequential approaches would probablyprovide similar performance at lower cost.

Focal-plane CNN chips can provide compact andfact solutions to applications, such as the processingof written characters, where large images and/or highpixel densities are not required. In order to implementlarge CNNs in VLSI, while maintaining their paral-lel processing features, alternative approaches capableof coping with the input (and output) problem mustbe considered. Multiple chip analog implementationswould enable, with the present microelectronic tech-nology and without the above integration density con-straints, the building of very large CNNs, provided thefollowing two topics are satisfactorily addressed:

— Efficient communication by analog signals (in-puts, outputs and inter-chip signals).

— Analog computation independent of process re-lated variations and drifts of circuit characteris-tics in order to obtain matched behaviour betweencells in different chips.

In this paper, three alternative analog implementationsof CNNs are described [13]:

— A low-power programmable CNN with embeddedphoto-sensors.

— A compact fixed-template approach based on sin-gle polarity current signals.

— A CNN circuit devised to implement very largeCNNs, which uses time multiplexing of digitalpulses to represent analog signals and whose dy-namics depend only on local ratios of circuit pa-rameters.

The first two approaches, devised for focal-plane chipsimplementing the basic CNN model, are briefly de-scribed, together with experimental results taken from

integrated prototypes, in Sections 2 and 3 respectively.The third implementation is more appropriate for verylarge multiple chip CNNs and is described in Section 4.It allows robust CNN-type networks to be built forspatial and temporal processing of visual informationwith well controlled time constants at different timescales.

2. Low-Power Programmable CNN Chip withEmbedded Photo-Sensors

Figure 1 shows schematically the circuit implementa-tion of a programmable CNN cell. The state equationof cell c, belonging to a CNN with M rows and Ncolumns, implemented by the circuit in Fig. 1 [12] is:

CXd V xc(t)

dt= −I yc(t)+ Igc = −I yc(t)

+∑

n

SAn−cI An−c

ILI yn(t)

+∑

n

SBn−cI Bn−c

IL

(I un

IL

IL B

)+ I

I yc = I L tanhk(V xc − VR)

2UT; 1≤ c ≤ M N;

n ∈ NR(c); I yn(t), I unIL

IL B∈ [−I L ,+I L ] (1)

All the differential pairs work in weak-inversion, andlateral bipolar transistors are used to distribute the pro-grammable parameters and the current units. Theprogrammable parameters are set by external volt-ages. For example, the currentIAN+1, affected bythe signSAN+1, is used to set the componentAN+1 =SAN+1(IAN+1/IL ) of the feedback templateA, ob-tained from the common external voltageVAN+1. Thebasic difference with respect to the original Chua-Yangmodel is in the loss term, which depends on the celloutputyc instead of on the internal cell statexc. Thissimplified model is a particularization of the ISR CNNmodel (Improved Signal Range) formalized in [4] and[11]. The VLSI implementations described in [3, 5,7] can also be identified as particular cases of such amodel. Moreover, a sigmoid-like function, instead ofthe conventional piecewise-linear function, is used togenerate the cell output. The low-current operation ofthe circuits allows the use of small photo-sensors andreduces power consumption.

The layout of the fabricated 8× 8 CNN chip basedon the circuits in Fig. 1 is shown in Fig. 2. Using a

40

Page 3: Focal-Plane and Multiple Chip VLSI Approaches to CNNshera.ugr.es/doi/15014022.pdf · [11]. The VLSI implementations described in [3, 5, 7] can also be identified as particular cases

Approaches to CNNs 265

Fig. 1. CMOS implementation of the programmable CNN cell.

1.2µm CMOS process an integration density of 10.7cells per square millimeter is obtained with 11 pro-grammable parameters (9 feedback parameters, onecontrol parameter and the offset term). The lateralbipolar transistors occupy about 25% of the cell area.Thus, density can be easily improved by implementingthe same circuits with a BICMOS process (20% lessarea is required with the 1.2µm BICMOS process ofthe same foundry). Only one metal layer has been usedfor interconnections leaving the second metal layer toprotect the circuitry from the incident light. Therefore,an additional metal layer would greatly reduce the cellarea due to the large area devoted to interconnection inthe present implementation, as can be seen in the lay-out in figure 2. The chip has been tested in a numberof processing tasks with images directly projected ontoit, performing either single or multiple sequential pro-gramming steps on the same input image: border de-tection, directional border detection, shadow creationin one, two or three directions, scratching of objects,replacing objects by the smallest rectangles coveringthem, etc. [16].

Figure 3 shows test results of multiple sequentialprogramming step applications. The network configu-ration for the different tasks is defined by the parame-

ters presented on the left of the figure. The test resultsof the implemented 8×8 CNN captured from the chipby a data acquisition board can also be seen on theleft. The oscillograms show the cell outputs of a singlecolumn and the control signals of the network.Fu→x

enables the CNN input, while the rising transitions ofthe signalFstart activate the network operation for thedifferent processing tasks.

An experimental power consumption of between afew and tens of microwatts per cell has been measured(depending on the network programming) for unit cur-rents ranging from 50nA to 200nA (12µW for a Con-nected Component Detection, CCD, with a current unitI L = 150nA, 16µW for a border extraction withI L =200nA). The time required by the network to reachits final state is also dependent on the specific pro-cessing task. Measured response times are betweena few microseconds and a few hundreds of microsec-onds for the above mentioned unit current range. Theworst-case times correspond to tasks in which severalcell state transitions take place before the final outputis obtained (as for example in the CCD, for which atime of about 150µsec is required using a unit currentof 150nA, while a border extraction takes 6µsec withI L = 200nA).

41

Page 4: Focal-Plane and Multiple Chip VLSI Approaches to CNNshera.ugr.es/doi/15014022.pdf · [11]. The VLSI implementations described in [3, 5, 7] can also be identified as particular cases

266 M. Anguita et al.

Fig. 2. Layout of the 8× 8 CNN fabricated chip.

3. Compact CMOS Unipolar Current-Mode CNNImplementation

The current-mode implementations of CNNs usuallywork with both positive and negative currents. Theuse of unipolar signals can provide a reduction in sili-con area (see also [10] and [15]). In the current modeimplementation here described, besides the reductionin the number of transistors required for replicationof the unipolar signals, an additional simplification isobtained by the use of simpler circuits to implementthe output pseudo-linear function. In order to operatewith only positive states a reference shift currentI R isadded to the state current of each cell,I xc, such thatI R ≥ |I RN|; with I RN being the negative limit of the

state current. According to [1a],|I RN| has a knownvalue for each template configuration:

|I RN| = I L +(∑

n

|An−c| +∑

n

|Bn−c|)

I L − I (2)

Moreover, in order to work with positive inputs andoutputs, the currentI L that limits these variables isadded; that is: 0≤ I L+ I uc ≤ 2I L , and 0≤ I L+ I yc ≤2I L .

In terms of the above shifted currents, the state equa-tion of the Chua-Yang CNN model can be written as:

τd I xc

dt= −(I R+ I xc)+ I R+ Igc(t)

42

Page 5: Focal-Plane and Multiple Chip VLSI Approaches to CNNshera.ugr.es/doi/15014022.pdf · [11]. The VLSI implementations described in [3, 5, 7] can also be identified as particular cases

Approaches to CNNs 267

Fig. 3. Test results of various sequential programming step applications.

43

Page 6: Focal-Plane and Multiple Chip VLSI Approaches to CNNshera.ugr.es/doi/15014022.pdf · [11]. The VLSI implementations described in [3, 5, 7] can also be identified as particular cases

268 M. Anguita et al.

= −(I R+ I xc)+∑

n

(I L + I yn(t))An−c

+∑

n

(I L + I un)Bn−c + Ioff

where

Ioff = I R+ I −(∑

n

An−c +∑

n

Bn−c

)I L (3)

Figure 4 shows schematically the proposed CMOS im-plementation of a CNN cell with the above state equa-tion.

A CNN chip for CCD has been designed, fabricatedand tested [16], based on the above circuits but withthe loss term depending on the output instead of thestate of the cell. Figures 5 and 6 show the circuit im-plementation and layout of a single cell, respectively.Each cell uses only 16 transistors (with cascoded cur-rent mirrors as replicators), a density of 230 cells persquare millimeter (without photo-sensors) is obtained.If photo-sensors are included, a cell density of 160 cellsper square millimeter will be obtained. These densityare considerably better than those obtained in the pre-vious CCD-CNN current-mode implementations [2, 7,11]. Using a power-supply voltage of 3V and a limitcurrent of 2I L = 1.2µA, a time of 2.3µs is needed toobtain the CCD computation in a row of 8 cells, and aworst-case power consumption of 13µW per cell hasbeen measured.

4. CMOS Circuits for Multiple Chip CNNs UsingSpikes

As in vertebrate visual systems, where the pre-processed visual information coming from the retinais transferred to cortical areas by means of spikes, ar-tificial VLSI vision systems might consist of a denseartificial retina (a dense array of specific photo-sensorsperforming a basic adaptive contrast of spatial and tem-poral visual information) whose outputs are transferredto multiple cellular-type layers. These networks, work-ing in continuous time and in parallel, extract relevantspatial and temporal features that must be combinedto carry out higher level form and motion processingtasks. To do this, an appropriate interchip commu-nication scheme must be used. It has been demon-strated that non-arbitrated address-driven communica-tions schemes are very efficient in this context [17, 18].They use the rate of asynchronously produced short

pulses (spikes) to represent analog signals, and the timemultiplexing of addresses on a fast digital bus to trans-fer a high number of such signals along a small num-ber of wires. In this way, in contrast with a sequentialscanning of images, the relevant visual information istreated preserving its temporal integrity. Typical pulsewidths of tens of nanoseconds and interspike times ofmilliseconds are used obtaining a small collision prob-ability. The circuits here described process directly thesignals represented by spike rates implementing themodel in Fig. 7. It can be considered as an extensionof the basic CNN model in which either linear (currentsources representing template components) or shunt-ing terms (conductances) depend on the instantaneousfrequency of the respective input spikes. The CMOScircuit implementation of a single positive term or ex-citatory synapse is shown in Fig. 8 [19]. A dual versionof this circuit implements an inhibitory one. Figure 9 isa layout of the integrated synapse including a section ofthe “membrane capacitance” (Cx in Fig. 7) for a CMOSprocess of 1.2µm. The same circuit can approximateeither a frequency controlled current source (FCI) ora frequency controlled conductance (FCG), dependingon the way the vertical dashed-line connection in Fig. 8is made. For each input pulse, the capacitorC+i is sud-denly discharged and then a current is injected on theoutput node during the time required to chargeC+i upto eitherVx (shunting terms) or to the reference voltageVr,i (linear terms). For a given input spike frequencyF+i , the contribution of the synapse to the “membranepotentialVx” can be approximated by [19]:

FCG:dVX

dt= K+i .C

+i

CX· (V+ − VX) · F+i ;

FC I :dVX

dt= K+i .C

+i

CX· (V+ − Vr,i ) · F+i (4)

Assuming that the passive decay conductancesG+

and G− (see Fig. 7) are also implemented by simi-lar FCG excitatory and inhibitory circuits controlledby a constant frequency referenceFR, the followinggeneral equation is approximated when a set of linearand shunting synapses is considered:

dVx

dt= −GR

CX(VX − VR)+

∑j,exc

I +j (F+j )

CX

−∑j,inh

I −j (F−j )

CX+∑i,exc

g+i (F+i )

CX(V+ − VX)

44

Page 7: Focal-Plane and Multiple Chip VLSI Approaches to CNNshera.ugr.es/doi/15014022.pdf · [11]. The VLSI implementations described in [3, 5, 7] can also be identified as particular cases

Approaches to CNNs 269

Fig. 4. Circuit implementation of the current-mode fixed-template CNN cell.

Fig. 5. Circuit implementation of a fixed template CNN cell for CCD using unipolar signals.

−∑i,inh

g−i (F−i )

CX(VX − V−) (5)

where: GR = G+(FR)+ G−(FR);VR = G+.V+ + G−.V−

G+ + G−;

g+/−i (F+/−i ) = K+/−i · C+/−i · F+/−i ;I +j (F

+j ) = K+j · C+j · (V+ − Vr, j );

I −j (F−j ) = K−j · C−j · (Vr, j − V−)

Thus, the circuits implement an extension of the CNNmodel by the inclusion of the shunting terms. The firstthree terms in eqn. (5) model the conventional decayand template terms in the CNN model. As can beseen, all temporal contributions to the state variableVX are defined by external references and local ratiosof currents (K terms) and capacitances, which are morecontrollable in VLSI than absolute values. The outputmodule in Fig. 7 is implemented by a circuit [20] thatperforms a pseudo-linear conversion ofVX to a spikerate, whose output frequency is also defined by external

45

Page 8: Focal-Plane and Multiple Chip VLSI Approaches to CNNshera.ugr.es/doi/15014022.pdf · [11]. The VLSI implementations described in [3, 5, 7] can also be identified as particular cases

270 M. Anguita et al.

Fig. 6. Layout of one cells of the integrated current-mode CNN for CCD (80× 54µm2) with the 1.2µm CAE process of AMS.

Fig. 7. Extended CNN cell model in which inputs and output are defined by spike rates.

references and local ratios:

FY(VX) = VX − Vmin

VDD· C0

CF· K · F0 (6)

Vmin and F0 are external voltage and frequency refer-ences;C0 andCF are internal capacitances of the spikegenerator, andK is a local current ratio.

46

Page 9: Focal-Plane and Multiple Chip VLSI Approaches to CNNshera.ugr.es/doi/15014022.pdf · [11]. The VLSI implementations described in [3, 5, 7] can also be identified as particular cases

Approaches to CNNs 271

Fig. 8. CMOS circuit implementation of an excitatory synapse in the model of Fig. 7.

Figure 10 shows experimental measurements illus-trating the good linear behaviour, the control of thetime constant, and the variability of the template com-ponents. In this experimentV+ is set to 3.5V and thethree tracesVX1, VX2, andVX3, taken from differentchips. They correspond, respectively, to reference volt-agesVr,i = 2V, 2.5V and 3V (in eqn. 4) representingtemplate values of 3, 2 and 1, when one linear synapseis excited with a constant input frequency. Additionalexperimental results illustrating features of the modelresulting from the shunting terms can be found in [20].In the present implementation, the circuits operate inthe range of micro to milliseconds; thus, instead of be-ing considered fast “accelerators” of traditional CNNs,they should be considered good approximators of thecontinuous dynamics of an extended computing model,by means of which many large processing layers of cel-lular networks can be built, with full parallel operation,while maintaining a good control of the time constantsat different time scales.

Figure 11 shows a multiple chip configuration wherea dense artificial retina would perform parallel sens-ing and a spatio-temporal contrast. A non-arbitratedaddress-driven communication scheme is used to in-terconnect the different modules through fast digi-tal buses. These interchip connections can be con-trolled by a digital optional block where the weightsof programmable synapses can be modified by mod-ulating the instantaneous frequencies of pulses thatwould arrive at the corresponding destination cells. Inthis case, the figure presents the possibility of hav-ing several cellular-type layers processing, in paral-lel, different features of the retinal output. Fig. 12shows how a single large CNN layer can be split

into several chips where local encoder/decoders areused to connect neighbour cells in different ICs whileglobal encoder/decoders achieve the transmission ofinput/output signals. Both multiple chip architecturescan be combined to perform complex analog process-ing tasks, maintaining parallelism and asynchrony inthe whole network.

5. Concluding Remarks

The paper summarizes three different VLSI approachesto cellular structures.

First an analog CMOS implementation of CNN withthe following features: (a) Embedded photosensors fordirect capturing of input images, thus exploiting the po-tential advantage of CNNs for parallel image process-ing. (b) Low-current operation, which allows the useof small photo-sensors and reduces power consump-tion. (c) The parameters defining the processing taskto be carried out by the CNN are modifiable by meansof external signals, which enable different processingtasks to be performed on the same or different inputimages.

Second, a fixed-template CMOS implementationbased on the use of single-polarity current-mode sig-nals has been shown, which is useful to carry out spe-cific processing tasks on images with a higher pixeldensity and speed than allowed by the programmableapproach. A considerable improvement in area is ob-tained due to the reduction in the number of transistorsrequired to signal replication, and the simplification ofcircuits to generate the pseudo-linear output function.Based on these circuits a chip with 8 cells for CCD hasbeen integrated and tested with a density of 230 cellsper square millimeter (excluding the photo-sensors).

47

Page 10: Focal-Plane and Multiple Chip VLSI Approaches to CNNshera.ugr.es/doi/15014022.pdf · [11]. The VLSI implementations described in [3, 5, 7] can also be identified as particular cases

272 M. Anguita et al.

Fig. 9. Layout of the excitatory synapse of Fig. 8. This occupies 60× 60µm2 excluding the section of membrane capacitance.

Third, a circuit approach to approximate the continu-ous dynamics of an extended CNN model, in which in-put and output analog signals are represented by asyn-chronously produced short pulses, is proposed as a vi-able alternative for the implementation of large analogcellular structures. Due to the goal control of the circuitparameters, which are defined in terms of local ratiosand to the use of an appropriate digital communicationscheme that greatly reduces the number of required in-terconnections, with these of circuits a large cellulararray of analog cells can be split into multiple chips,preserving the dynamics of the network as a whole.

The third VLSI approach corresponds to CNN cir-cuits devised to implement very large CNNs withan accurate control of the transient and the steady-state neural responses, both being defined by process-independent local ratios. The design of the circuitswith well controlled time constants provides the pos-sibility of using cellular-type layers combined with ar-tificial retinas to perform temporal processing taskslike novelty detection and motion processing. Inter-neuron communication is achieved by means of veryshort pulses multiplexed in fast digital buses, preserv-ing features of the information. Adjustable weights for

48

Page 11: Focal-Plane and Multiple Chip VLSI Approaches to CNNshera.ugr.es/doi/15014022.pdf · [11]. The VLSI implementations described in [3, 5, 7] can also be identified as particular cases

Approaches to CNNs 273

Fig. 10. Temporal evolution of the states of integrated cells, with a constant input applied to a linear synapse, for three template values.

Fig. 11. Multiple chip CNN implementation: system architecture of a low level vision system based on the scheme in [21].

Fig. 12. Multiple chip CNN implementation: CNN split into several chips using additional local address encoders/decoders to interconnectneighbour cells in different chips.

49

Page 12: Focal-Plane and Multiple Chip VLSI Approaches to CNNshera.ugr.es/doi/15014022.pdf · [11]. The VLSI implementations described in [3, 5, 7] can also be identified as particular cases

274 M. Anguita et al.

the shunting terms (FCG) can be implemented eitherusing variable ratio current mirrors (parametersK+/−i

and K+/−j in eqns. 4 and 5) or modifying, by meansof digital circuitry, the inter-neuron spike frequencies.The weights of the linear terms (FCI) can be modulatedthrough the adjustable voltage referencesVr,i in eqns. 4and 5.

Acknowledgements

This work has been partially supported by theEC Project ERBCHBGCT920027, and the SpanishCICYT Project TIC96-0634.

References

1a. L. O. Chua and L. Yang, “Cellular neural networks: Theory.”IEEE Transactions on Circuits and Systems35 (10), pp. 1257–1272, Oct. 1988.

1b. L. O. Chua and L. Yang, “Cellular neural networks: Appli-cations.”IEEE Transactions on Circuits and Systems35 (10),pp. 1273–1290, Oct. 1988.

2. J. M. Cruz and L. O. Chua, “A CNN chip for connected com-ponent detection.”IEEE Transactions on Circuits and Systems38 (7), pp. 812–816, July 1991.

3. H. Harrer, J. A. Nossek, and R. Stelzl, “An analog implemen-tation of discrete-time cellular neural networks.”IEEE Trans-actions on Circuits and Systems39 (3), pp. 466–476, May1992.

4. A. Rodrıguez Vazquez, S. Espejo, R. Dom´ınguez Castro,J. L. Huertas, and E. S´anchez Sinencio, “Current-mode tech-niques for the implementation of continuous and discrete timecellular neural networks.”IEEE Transactions on Circuits andSystems Part I40 (3), pp. 132–146, March 1993.

5. J. E. Varrientos, E. S´anchez-Sinencio, and J. Ram´ırez-Angulo,“A current-mode cellular neural networks implementation.”IEEE Transactions on Circuits and Systems Part I40 (3),pp. 147–156, March 1993.

6. M. Anguita, F. J. Pelayo, A. Prieto, and J. Ortega, “Ana-log CMOS implementation of a discrete-time CNN with pro-grammable cloning templates.”IEEE Transactions on Circuitsand Systems Part I40 (3), pp. 215–219, March 1993.

7. S. Espejo, A. Rodr´ıguez-Vazquez, R. Dom´ınguez-Castro,J. L. Huertas, and E. S´anchez-Sinencio, “Smart-pixel cellularneural networks in analog current-mode CMOS technology.”IEEE Journal of Solid-State Circuits29 (8), August 1994.

8. F. Sargeni and V. Bonaiuto, “High performance digitally pro-grammable CNN chip with discrete templates,” inProceedingsof the CNNA-94, Rome, 1994, pp. 67–72.

9. A. Paasio, A. Dawidziuk, K. Halonen, and V. Porra, “Currentmode cellular neural network with digitally adjustable templatecoefficients,” inProceedings Microneuro ’94, IEEE Comp.Soc. Press., Turin, 1994, pp. 268–272.

10. P. Kinget and M. S. J. Steyaert, “A programmable analog cel-lular neural network CMOS chip for high speed image pro-cessing.”IEEE Journal of Solid-State Circuits30 (3), March1995.

11. S. Espejo, “VLSI design and modeling of CNNs,” PhD. Dis-sertation, University of Sevilla, March, 1994.

12. M. Anguita, F. J. Pelayo, F. J. Fernandez, and A. Prieto, “A low-power CMOS implementation of programmable CNNs withembedded photo-sensors.” To appear inIEEE Transactions onCircuits and Systems Part I.

13. M. Anguita, F. J. Pelayo, E. Ros, D. Palomar, and A. Prieto,“VLSI implementations of CNNs for image processing andvision tasks: Single and multiple chip approaches,” invitedtalk at the4th IEEE International Workshop on Cellular NeuralNetworks and their Applications, June, 1996.

14. T. Matsumoto, L. O. Chua, and H. Suzuki, “CNN cloningtemplate: Connected component detector.”IEEE Transactionson Circuits and Systems37, pp. 633–635, May 1990.

15. A. J. Schuler, M. Brabec, D. Shubel, and J. A. Nossek,“Hardware-oriented learning for cellular neural networks,”in Proceedings of CNNA-94, Rome, Italy, December 1994,pp. 183–188.

16. M. Anguita, “Implementaci´on de arquitecturas VLSI para re-des neuronales celulares (CNNs),” Ph.D. Dissertation, Depar-tamento de Electr´onica y Tecnolog´ıa de Computadores, Uni-versidad de Granada, June 1996.

17. A. Mortara and E. A. Vittoz, “A Communication ArchitectureTailored for analog VLSI Artificial Neural Networks: IntrinsicPerformance and Limitations.”IEEE Transactions on NeuralNetworks5 (5), May 1994.

18. A. Mortara, A. E. Vittoz, and Ph. Venier, “A communicationscheme for analog VLSI perceptive systems.”IEEE Journal ofSolid State Circuits30 (6), pp. 660–669, June 1995.

19. F. J. Pelayo, E. Ros, P. Martin-Smith, F. J. Fern´andez, andA. Prieto, “A VLSI approach to the implementation of additiveand shunting neural networks,” inLecture Notes in ComputerScience, Springer-Verlag, July 1995, vol. 930, pp. 728–735.

20. F. J. Pelayo, E. Ros, X. Arreguit, and A. Prieto, “A bio-inspiredVLSI neural model using spikes.” Sent toAnalog IntegratedCircuits and Signal Processing, Kluwer Academic Publishers.

21. X. Arreguit and E. A. Vittoz, “Perception systems implementa-tion in analog VLSI for real-time applications,” inProceedingsof the PerAc’94, Lausanne, Switzerland, 1994, pp. 170–180.

Mancia Anguita received a B.Sc. degree and a Ph.D.in Computer Science from the University of Granada,Spain, in June 1991 and July 1996. Since 1991 sheis a teaching assistant at the Department of Electron-ics and Computer Architecture at that University. Herresearch interests lay in parallel computer architectureand analog VLSI circuit implementations.

50

Page 13: Focal-Plane and Multiple Chip VLSI Approaches to CNNshera.ugr.es/doi/15014022.pdf · [11]. The VLSI implementations described in [3, 5, 7] can also be identified as particular cases

Approaches to CNNs 275

Francisco J. Pelayo, born in Granada, Spain, in 1960,received the B.Sc. degree in electronic physics in 1982,the M.Sc. degree in electronics in 1983, and the Ph.D.degree in 1989, all from the University of Granada,Spain.

In September 1983, he joined the Departamento deElectronica y Tecnolog´ıa de Computadores at the Uni-versity of Granada, where he is currently an AssociateProfessor. He has worked in the areas of multiple-valued logic, VLSI design and test, artificial neural net-works and fuzzy systems. His current research interestlies in the fields of VLSI artificial neural networks, andbio-inspired processing systems.

Eduardo Rosreceived the B.Sc. Degree in ElectronicPhysics in 1993, and Electronic Engineering in 1996,both from the University of Granada, Spain. He joinedthe Departamento de Electr´onica y Tecnolog´ıa de Com-putadores 1993, where he is currently Research Assis-tant. His research interests include bio-inspired VLSIneural systems.

David Palomar was born in Almu˜necar (Spain) on1972. He was graduated on Computer Sciences bythe University of Granada in 1995. Now he enjoys afellowship from Spanish E.M. and is working towardhis Ph.D. degree at the Department of Electronics andComputer Technology of the University of Granada. Heis interested in analog VLSI implementation of neuralsystems, with special attention to vision related sys-tems.

Alberto Prieto received a B.Sc. degree in ElectronicPhysics in 1968 from the Complutense University(Madrid) and a Ph.D. degree from the University ofGranada, Spain, in 1976. From 1969 to 1970 he was atthe Centro de Investigaciones T´ecnicas de Guipuzcoaand at the E.T.S.I Industriales of San Sebasti´an. From1971 to 1984 he was Director of the Computer Centre,and from 1985 to 1990 Dean of the Computer Scienceand Technology studies of the Univ. of Granada. He iscurrently a Full Professor in the Dept. de Electronica yTecnologia de Computadores.

His research interests are in the areas of artificial neu-ral networks, multiple-valued circuits, microprocessor-based systems, and VLSI design and testing. Dr. Pri-eto has been nominated member of the IFIP WG 10.6(Neural Computer Systems). He is a member of INNS,AEIA, AFCET, IEEE and ACM, and Chairman of theSpanish Regional Interest Group of the IEEE NeuralNetwork Council.

51