A fast fixed point learning method to implement associative memory on CNNs

362 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—I: FUNDAMENTAL THEORY AND APPLICATIONS, VOL. 44, NO. 4, APRIL 1997

A Fast Fixed Point Learning Method toImplement Associative Memory on CNN’s

Peter Szolgay, Istvan Szatmari, and Karoly Laszlo

Abstract—Cellular Neural Networks (CNN’s) with space-varying inter-connections are considered here to implement associative memories. A fastlearning method is presented to compute the interconnection weights. Thealgorithm was carefully tested and compared to other methods. Storagecapacity, noise immunity, and spurious state avoidance capability of theproposed system are discussed.

Index Terms—Associative memory, fixed point.

I. INTRODUCTION

Cellular Neural Networks (CNN’s), as introduced in [1] and [2],are cellular, analog, multidimensional processing-arrays with localinterconnections among the processing elements. The CNN architec-ture can be programmed by interconnecting weights of processingcells, called cloning templates. An extension of the CNN paradigmis the CNN Universal Machine architecture [3] where distributed andglobal memories and logic functions are available for implementingcomplex analogic (analog+logic operations) algorithms. CNN’s areused here to implement associative memories, such a functionalitycan then be embedded in analogic CNN algorithms.

Using associative memories, the corrected output can be restoredfrom an input corrupted by noise. Information in a CNN array canbe stored [4] as a stable equilibrium point (fixed point), as a stableperiodic oscillation, or as a chaotic attractor.

In our contribution a new fixed point method is used for imple-menting associative memory. An empirical estimation is presentedwhich allows us to predict the error correction capability (ECC) asa function of the stored memory vectors and of the error correctionlevel.

This brief is organized as follows. Section II shows the CNNarchitecture used for memory implementation. Here we present somequalitative analysis on condition for testing stable equilibrium pointsand avoidance of spurious states. In Section III, a fast learningmethod is presented to compute the space-varying cloning templatesExperimental results are presented in Section IV. Section V discussessome properties of the algorithm including storage capacity and ECC.Section VI shows a comparison of our results with other fixed pointmethods with respect to storage capacity, noise immunity, speed ofcomputation of connection weights, and spurious state avoidancecapability.

II. CNN AND ASSOCIATIVE MEMORIES

For the purpose of this brief we consider an autonomous CNNarray ofM � N cells. The dynamics are

_xij = �xij +ATij � yij

+ Iij 1 � i �M and1 � j � N

yij = f(xij) f(xij) =1

2� (jxij + 1j � jxij � 1j)

(1)

Manuscript received October 30, 1995; revised April 23, 1996. This paperwas recommended by Associate Editor J. Pineda de Gyvez.

The authors are with Analogic and Neural Computing Systems Laboratory,Computer and Automation Institute, Hungarian Academy of Sciences, H-1111Budapest, Kende u. 13-17 Hungary.

Publisher Item Identifier S 1057-7122(97)02069-2.

wherex, andy are the state, and output variables of cell(ij); Aij

are the template coefficients, or intercell couplings within a localneighborhoodNr; y

ijis the subpattern (template sized window)

around theijth cell, I is the bias, andf is a sigmoid type piecewise-linear output function.

Feature vectors to be classified are represented as the initial state ofthe network, and the reached equilibrium point (if any) is associatedwith stored pattern. A necessary and sufficient condition [5] for thestorage of a set ofP binary patterns is given by

ATij � p

n

ij+ Iij � y

nij > 1 (2)

where

pn

ij= y

n

ij� y

nij

n = 1; � � � ; P ; i = 1; � � � ;M ; j = 1; � � � ; N:

This condition means that if the sign of the state values and thesign of the first derivatives of the states are equal, then the stateswill reach the saturation region and the solution is an asymptoticallystable equilibrium point of the network.

Considering the question of spurious states avoidance, condition (2)shows the important fact that we cannot store only desired patternsbecause possible combinations of subpatterns, taking them fromstored patterns, will be stable equilibrium points too. The reversedpatterns satisfy also condition (2). Assuming thatAij;ij � 0, it iseasy to show, that the center template element adds also nonnegativevalue to a sum of condition (2) in the case of spurious patterns. Inorder to keep the number of spurious states to a minimum, the optimalvalue of the center elements isAij;ij = 0.

III. A F AST LEARNING ALGORITHM

There are different models and methods to construct associativememory, e.g., a discrete-time CNN with strictly nearest neighborhoodusing feedforward templates only was used in [6] where the templateelements were computed by the Hebbian learning rule. In anotherassociative memory implementation, continuous time CNN was used[7] and the feature vectors to be classified were associated withinitial state. In that approach the feedback template was computed bythe singular value decomposition method. Though several learningmethods have been reported for CNN [8], they work efficiently onlyin the case of relatively few parameters. Due to the space-varyingtemplates, the number of free parameters is large. To evaluate alearning algorithm a simple computable cost function has to be found,providing also for good convergence speed. The equilibrium has to belocated far enough inside the saturation region in order to have robuststability. LetE be a positive constant around which we would like toset the magnitude of an equilibrium point of a cell. Considering thestate equation of the cells in steady state, we get a similar conditionto (2)

ATij � p

n

ij+ Iij � y

nij = E

nij > E > 1 (3)

wheren = 1; � � � ; P ; i = 1; � � � ;M ; j = 1; � � � ; N:

If the desired outputs are given we are looking for anAij vector(template) for which condition (3) is met. We chose the well-knownHebbian learning rule with an additional gain factorwn

ij of the nthsubpattern.

Aij =

P

n=1

wnij � p

n

ij: (4)

1057–7122/97$10.00 1997 IEEE

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—I: FUNDAMENTAL THEORY AND APPLICATIONS, VOL. 44, NO. 4, APRIL 1997 363

Fig. 1. Linear function depending onEn to compute the additional gainfactor (�w) in the Hebbian learning rule.

TheAij is computed in an iterative way starting from an arbitraryinitial value. For the sake of simplicity we can involve the bias (ifwe use any) into theAij template vector and extendpn

ijwith ynij . Let

[EL;ER] be the interval into which the magnitudes of the equilibriumpoints of the cells have to be set. The steps of the algorithm are asfollows:

1) Aij(0) = 0; i = 1; � � � ;M ; j = 1; � � � ; N ;2) En

ij(k) = ATij(k) � p

n

ij; n = 1; � � � ; P ;

3) �wnij(k) = �� En

ij(k)�E (5)if En

ij(k) =2 [EL;ER]; EL > 1; � > 0�wn

ij(k) = 0 otherwise;4) Aij(k + 1) = Aij(k) +

P

n=1�wn

ij � pn

ij.

We used a linear function to determine�wnij showed in Fig. 1.

The optimal setting forE is discussed in Section V. The algorithm isadditive, which means that a new picture can be simply incorporatedinto an already trained system. The previously taught template isthe initial template in the learning process of the new picture. Thecondition of the convergence of the algorithm is

0 < � <1

d � P(6)

whereP is the number of patterns to be stored andd is the numberof free parameters to be computed (the number of template values+bias). This is valid if we do not fix the central template element tozero. The proof of the convergence is discussed in the Appendix.

IV. EXPERIMENTAL RESULTS

A simple example is considered to demonstrate the applicabilityof the method. With the presented algorithm the learning process of25 9� 9 Chinese characters (Fig. 2) takes about 2–5 min on a 66MHz AT-486 PC. For the same task, the computation of templates,by using singular value decomposition [7], takes about half an houron the same hardware configuration.

To test our space-varying templates the CNNM simulator [9] wasapplied. Two types of noise models were used, namely the zero meanGaussian noise with a given deviation� 2 [0.0; 1.0] and the digitalor channel noise with a given ratio of reversed pixelst 2 [5%; 20%].

The error correction was tested as a function of the[EL;ER] inter-val on 25 Chinese characters. Results are summarized in Table I–III.By using a largerE value, the correction capability of the networkwas better and the transient was quicker (Table I). The same testvectors were learned with a larger neighborhood size and the resultswere better (compare Table II to Table I). Nevertheless, largerE

exhibited worse correcting capability in the case of very noisy initialpatterns (See� = 1 rows in Table II). In the case of� = 0.7, the

Fig. 2. 25 Chinese characters to test capabilities of the learning method.

TABLE IRUNNING EXPERIENCE WITH ALL 25 CHINESE

CHARACTERS OF FIG. 3, THE NEIGHBORHOOD SIZE IS 1

largerE was better as it was expected. The problem may be caused bythe dominant property of the center element of the template(Aij;ij).Due to the learning method, the center element is always larger thanthe template values around. By using largerE the transient is quickerand the neighboring cells cannot compensate the effect of theAij;ij .

The algorithm optionally enables to fix the central template ele-ments at a constant value,Aij;ij = c. Table III shows the number ofcorrected pictures. Results were similar to the previous case.

If the neighborhood size was 3, the network was able to correct allthe noisy initial patterns shown in Fig. 3. generated by adding zero-mean Gaussian noise with standard deviation� = 1. The centraltemplate elements were not fixed, theE interval was [1.5; 5.0].

V. DISCUSSION

The presented method guarantees that each desired pattern isa stable equilibrium point of the system. There is no constraint,however, for the trajectory. The system may converge to a spuriousstate. In the first epoch of the learning process, the algorithm fixesthe center template element to zero. If it does not converge after agiven number of steps, then the center template element is allowed tochange. During the learning process the value of the central templateelement is monotone increasing, and consequently the number of


TABLE IIEXPERIENCE WITH THE 25 CHINESE CHARACTERS, THENEIGHBORHOOD SIZE IS 2

Fig. 3. The original Chinese characters corrupted by zero mean Gaussiannoise with standard deviation� = 1, all the characters have been successfullyrestored using neighborhood size 3.

TABLE III25 CHINESE PICTURES, THE NEIGHBORHOOD SIZE 1. = 1.5

the spurious states of the system increases also. The process stopsif condition (3) for each pattern is met. The solution found by thealgorithm is optimal from the spurious state avoidance point of viewbecause the central template element takes the smallest value bywhich condition (3) can be satisfied.

TABLE IVPREDICTION OF LOCAL ERROR CORRECTIONCAPABILITY (ECC)

OF 25 CHINESE CHARACTERS DEPENDING ONTEMPLATE SIZE

AND NOISE LEVEL. THE OPTIMUM OF E WAS CHOSEN FROM

A LOOK-UP TABLE OBTAINED BY A COMPUTER ESTIMATION

We introduce the idea of the ECC which is the probability ofthe error correction at a given noise level at a given position ofthe network. The results show that there is an optimum value ofE in terms of error correction. In the sense that at this value isthe highest ratio of the ECC. In our interpretation, storage capacityand ECC cannot be discussed independently. Due to the fact thatalgorithm can teach each desired pattern, we can consider the storagecapability as the level of error correction, or in other words, thebasin of attraction of stored patterns. For instance, the algorithm canteach all the possible binary patterns but in this case there is noerror correction. Storage capacity is at its maximum( 2

2

� 100%= 100%) and ECC is zero. On the other hand, if there is only onepattern to be learned, the network will always recall it(ECC =100%)but the storage capacity is smallest( 1

2

� 100%). Let us introducethe following expression to characterize the correlation among thesubpatterns or the fitness of subpatterns in terms of learning

Rij =

P

n=1pnij

T

�P

n=1pnij

� P 2

P 2 � (d� 1)(7)

whereP is the number of patterns to be learned andd is the numberof free parameters (template elements and bias value).Rij 2 [0 ; 1].

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—I: FUNDAMENTAL THEORY AND APPLICATIONS, VOL. 44, NO. 4, APRIL 1997 365

TABLE VCOMPARISON OF DIFFERENT FIXED-POINT METHODS

The computed average values ofRij for P number random binarypatterns are shown below.

The maximum ECC is atRij = 1 (all the subpatterns are the same).In the worst case(Rij = 0), the ECC is zero. In this case, if we setthe central template element to zero then there is no solution of (4)satisfying condition (2) at a given position(ij). Allowing the centraltemplate element to be changed, this central element will only bedifferent from zero and the cell recalls both binary values(�1;+1).The ECC is a function of the correlation of patterns(Rij), of the sizeof the template(d), and of the noise level(�), and is denoted byECC(E(R;d); �). We are looking for aE value where this functiontakes its maximum,ECCmax(Eopt(R; d); �). We estimated the ECCfor discrete values ofE, in the range1 < E < 20, at given values ofR; d, and�. Two conditions were used to test ECC, namely the errorcorrection (EC), conditions (8), and fault tolerance (FT), condition(9).

AT(E) � (p+ e(�)) > 1 (8)

AT(E) � (p+ e(�)) < 1; p = �p

jp =p =1(9)

where A(E) is the learned template at givenE; p is the patternto be stored,p was constructed fromp assuming a large error(inverted pixel value) in the center position, ande(�) is the noiseperturbation. We considered 10 independent series of random binarypatterns and taught with the algorithm at the givenE. In eachexperiment we generated 100 zero-mean Gaussian noise vectors withstandard deviation�. If the conditions were satisfied independentlywe considered it a successful case. The ratio between the numberof successful cases and the number of experiments was computed.Condition (8) is the modified version of condition (2) with noiseperturbation approximating the ECC at small values ofE. FunctionEC(E) is monotone increasing. The FT approximates the ECC atlarge values ofE. If condition (9) is satisfied the error cannot bestable solution. Function FT(E) is monotone decreasing. We canapproximate the value of ECC by choosing the minimum valuebetween EC and FT at a givenE, see Fig. 4. Taking this test fordifferent values of the triplet(R; d; �), we have a look-up tableto set the optimum value ofE. Additionally, we can predict theaverage error correction extending the estimated local error correction

Fig. 4. Estimating ECC as a function ofE. The EC approximates it at smallvalues ofE, the FT approximates it at large values ofE. The neighborhoodsize is 1(d = 9), the noise level is� = 1 (SD).

to the entire network. In the case of Chinese characters at differentneighborhood radius and noise level we got a close correspondencewith experimental result (Table IV).

VI. CONCLUSION

We have presented a novel type approach for CNN based associa-tive memory design with space-varying templates. The correspondinglearning algorithm has the following beneficial properties: fast learn-ing speed, additive learning property, and a good noise immunity.

Based on theoretical consideration, an empirical estimation isobtained to evaluate the ECC.

The comparison shown in Table V verifies that our associativememory implementation is an acceptable alternative to other pub-lished fix point methods.

P 2 3 4 5 6 7 8 9 10 15 20

Rij 0:50 0:34 0:25 0:20 0:16 0:14 0:13 0:11 0:10 0:07 0:05


APPENDIX

Proof of convergence in the learning process is presented. Consid-ering the values ofEn

ij in steps(k) and(k + 1), we can write theirdifference as follows:

Enij(k) = Aij(k)

T� p

n

ij

Aij(k + 1) = Aij(k) + � � E �

P

n=1

pn

ij

� � �

P

n=1

Enij(k) � p

n

ij; � > 0

Enij(k + 1) = E

nij(k) + � � E � p

n

ij

T�

P

m=1

pm

ij

� � � pn

ij

T�

P

m=1

Emij (k) � p

m

ij

�Enij(k + 1) = E

nij(k + 1)� E

nij(k)

= � � E � pn

ij

T�

P

m=1

pm

ij

� � � pn

ij

T�

P

m=1

Emij (k) � p

m

ij:

Let � be the possible maximum value ofj(pnij)T �

P

m=1pmijj, it

is the case, if each pattern is same,� = d � P .Let � be the maximum ofjEn

ij(k)j. We can write�E(k + 1) =

j� � E � � � � � � � �j = � � � � jE � �j = � � � � �E(k). Thiswill converge to zero if� � � < 1. Thus we get�E(k) ! 0 )

�Enij(k) ! 0 ) �wn

ij(k) ! 0 ) Enij(k) ! E. We get the

condition (6)0 < � < 1

d�P.

We get a less strict condition for�, if (pnij)T �

P

m=1pmij

> 0

for eachn = 1; � � � ; P , and solving the inequality of convergencejE � En

ij(k + 1)j < jE � Enij(k)j; 8n = 1; � � � ; P ,

namely

0 < � <2

d � P

.

ACKNOWLEDGMENT

The authors thank T. Roska for the helpful discussions. Theencouraging suggestions of the reviewers are kindly acknowledgedalso.

REFERENCES

[1] L. O. Chua and L. Yang, “Cellular neural networks: Theory,”IEEETrans. Circuits Syst., vol. 36, pp. 1257–1272, 1988.

[2] L. O. Chua and T. Roska, “The CNN paradigm,”IEEE Trans. CircuitsSyst. I, vol. 40, pp. 147–156, 1993.

[3] T. Roska and L. O. Chua, “The CNN universal machine: An analogicanalog computer,”IEEE Trans. Circuits Syst. II, vol. 40, pp. 163–173,1993.

[4] P. Thiran and M. Hasler, “Information processing using stable andunstable oscillations: A tutorial,” inProc. IEEE CNNA’94, Rome, Italy,1994, pp. 127–136.

[5] I. Fajfar and F. Bratkovic, “Finding all stable equilibria of cellular neuralnetworks,”Int. J. Circuit Theory Applicat., vol. 22, pp. 145–155, 1994.

[6] S. Tan, J. Hao, and J. Vandewalle, “Cellular neural networks as a modelof associative memories,” inProc. IEEE CNNA’90, Budapest, Hungary,1990, pp. 26–35.

[7] D. Liu and A. N. Michel, “Cellular neural networks for associativememories,” IEEE Trans. Circuits Syst. II, vol. 40, pp. 119–121, Feb.1993.

[8] J. A. Nossek, “Design and learning with cellular neural network,” inProc. IEEE CNNA’94, Rome, Italy, 1994, pp. 137–146.

[9] “CNNM, multi layer cellular neural network simulator, user’s guide,version 5.3,” Analogic and Neural Computing Laboratory of HAS.

A Control-Based Approach to the Solutionof Nonlinear Algebraic Equations

Angelo Brambilla and Dario D’Amore

Abstract—A different approach to the solution of a nonlinear set ofalgebraic equations is presented. It is basically a revision of the Newtoniterative algorithm from a digital control point of view. The Newtonalgorithm is considered like a digital control algorithm that acts on a set ofnonlinear algebraic equations. Its target is to find a valuex� that satisfiesthe algebraic equation set. This value can be considered as a particular“input” of the equation set which gives a zero “output” while the iterationindex can be considered as the clock of the digital system. From thispoint of view some correlations between the stability of digital systemsand the Newton algorithm can be shown. This approach allows us tounderstand the reasons behind the convergence failure of some modifiedNewton algorithms such as source stepping, damping, and limiting thatliterature often reports as heuristic.

Index Terms—Electrical simulation, numerical algorithms.

I. INTRODUCTION

The Newton algorithm is widely used to solve algebraic nonlinearequation sets, but it does not guarantee global convergence. Theoriginal method was modified somewhat to obtain a more efficientand robust algorithm. If we focus our attention on circuit simulation,we can see that source stepping, damping and limiting techniques areemployed [1], [3], [8]. However, the Newton algorithm can fail onnonlinear circuits even if these modifications are adopted.

In this brief we revisit the problem of finding the solution to anonlinear algebraic equation set with an iterative method that runs ona digital computer. We revisit the theory behind the Newton algorithmand some of its modifications from a digital control point of view andstudy the system’s stability together with its convergence propertiesshowing in detail the effects of modifications.

The Newton algorithm is considered like a digital controller appliedto a plant whose model is represented by the nonlinear equation setF [x]. The target of the digital controller is to find an inputx� of theplant that gives a zero output, orF [x�] = 0. In this case, the plantoutputF [�] can be treated like an error signal which is sent back to thecontroller. The clock signal of the digital controller can be comparedto the iteration index of the Newton algorithm that can be consideredas a sort of normalized time with respect to the clock signal period.Our aim is to find the zero output of the plant as quickly as possible,that is, with the lowest number of clock periods or Newton iterations.

Manuscript received April 12, 1994; revised September 15, 1996. This paperwas recommended by Associate Editor A. Premoli.

The authors are with Dipartimento di Elettronica ed Informazione, Politec-nico di Milano, I-20133 Milan, Italy.

Publisher Item Identifier S 1057-7122(97)02722-0.

1057–7122/97$10.00 1997 IEEE

Documents

A fast fixed point learning method to implement associative memory on CNNs