Rapid learning and robust recall of long sequences in modular associator networks

ARTICLE IN PRESS

0925-2312/$ - se

doi:10.1016/j.ne

�CorrespondE-mail addr

Neurocomputing 69 (2006) 634–641

www.elsevier.com/locate/neucom

Rapid learning and robust recall of long sequences in modularassociator networks

Michael Lawrence, Thomas Trappenberg�, Alan Fine

Dalhousie University, Halifax, Canada

Available online 10 January 2006

Abstract

Biologically inspired neural networks which perform temporal sequence learning and generation are frequently based on hetero-

associative memories. Recent work by Jensen and Lisman has suggested that a model which connects an auto-associator module to a

hetero-associator module can perform this function. We modify this architecture in a simplified model which in contrast uses a pair of

connected auto-associative networks with hetero-associatively trained synapses in one of the paths between them. We simulate both

models, finding that accurate and robust recall of learned sequences can easily be performed with the modified model introduced here,

strongly outperforming the previous architecture.

r 2005 Elsevier B.V. All rights reserved.

Keywords: Sequence memory; Hetero-associations

1. Introduction

A sequence S of length p is a list x1; x2; . . . ; xp of patterns,each pattern representing a memory in the sequence. Ingeneral, it is possible that xm1 ¼ xm2 for m1am2, in whichcase knowing the current pattern xm1 ¼ xm2 is not sufficientto determine if the next pattern is xm1þ1 or xm2þ1 duringrecall. For a pattern xm, the length of the precedingsubsequence which uniquely determines xm is called thedegree of xm [18]. The degree of a sequence is the maximumdegree of its components, and a sequence is called simple ifit has degree 1, otherwise it is known as complex. Forexample, the sequence A;B;C;D is a simple sequence, andA;B;C;D;B;C;A is a complex sequence of degree 3.

In contrast to a large body of work on engineeringapplications of sequence recognition and generation, weare specifically interested in neural architectures that caneasily learn and recall temporal sequences based onassociative networks and Hebbian learning. Networksthat are able to learn and recall sequences rely on someform of hetero-associations between different patternsin a sequence in addition to standard auto-associative

e front matter r 2005 Elsevier B.V. All rights reserved.

ucom.2005.12.003

ing author. Fax: +1902 492 1517.

ess: [email protected] (T. Trappenberg).

point-attractor networks. Such networks have long beenproposed and studied by Grossberg [3,4]. Hopfield [6]noted the difficulty in recalling sequences of more than fouritems, and subsequent research has focused on additionalmechanisms to stabilize hetero-associative sequential mem-ories [16,8,13,5].In this paper we discuss a modular neural architecture

that is able to learn and robustly recall long temporalsequences. The architecture consists of two coupledrecurrent networks with auto- and hetero-associativeconnections. Similar networks have been studied by Jensenand Lisman [7] to model memory functions in a cortico-hippocampal network, and later to model the hippocampusin more detail [10]. The networks studied by Jensen andLisman are based on the coupling of an auto-associativenetwork with a hetero-associative network, whereas weconsider here a network consisting of two auto-associativenetworks that are coupled through hetero-associativeconnections.A related approach to sequence learning and recognition

is discussed in the literature under the name of synfirechains [1,2,9]. In contrast to the approaches that grew outof the auto-association literature, this theory is based onpure hetero-associative connections in (recurrent) diver-ging/converging chains [1]. However, a bidirectional

www.elsevier.com/locate/neucom

ARTICLE IN PRESSM. Lawrence et al. / Neurocomputing 69 (2006) 634–641 635

modular model very similar to the model studied here hasrecently been studied by Sommer and Wennerkers [12].This model couples two auto-associative modules withhetero-associative intra-modular connection, whereas wefound it beneficial to consider only one intra-modularpathway to be hetero-associative. While synfire chains areoften discussed with neocortical processing [2], it has alsobeen implicated with Hippocampal functions [19].

While we compare the performance of sequence learningand recall of an architecture used by Jensen and Lismanwith some modified versions, it is important to note thatthe models by Jensen and Lisman, as well as relatedhippocampal models by others [17], consider sequences ofitems that are time compressed through a phase precessionmechanism or a related mechanism based on multiplexingwith gamma oscillations. Such sequences are limited to arelatively small number of items (in the order of 4–7) or arelimited in their timescale to events occurring on the orderof seconds. Such memories of short duration sequences arenot considered in this study. Instead, we consider here thestorage and recall of long sequences with hetero-associa-tions that can bind memories of arbitrary time scales.

The rest of this paper is organized as follows: Section 2reviews associative memory models of sequence memorybased on auto- and hetero-associations. Section 3 describesthe modular architecture and the two different modelsstudied in this paper. In Section 4, the models arecompared experimentally with simulations to solidify theintuition behind their operation. Conclusions and sugges-tions for future research direction are given in Section 5.

2. Associative sequence memory

We consider in this paper associative memories of leaky-integrator nodes governed by the dynamics

tduiðtÞ

dt¼ �ui þ

Xj

wijrjðtÞ (1)

on a time scale of t, where uiðtÞ is the internal state of nodei at time t. ri is the corresponding firing rate related to ui

through a gain function g,

riðtÞ ¼ gðuiðtÞÞ. (2)

The weights between node j and node i are trained withHebbian learning rules. In such rules, the weights areincremented proportional to the activity of the framingnodes

Dwij / ðri � hriiÞðrj � hrjiÞ, (3)

where h:i indicates mean node activity. In the following, wesimplify the notation by assuming rate representations withzero mean and learning procedures which clamp thenetwork state to the training patterns. We distinguishbetween (symmetric) auto-associative weights

wAij ¼

1

N

Xm

xmi xmj (4)

and (asymmetric) hetero-associative weights

wHij ¼

1

N

Xm

xmþ1i xmj , (5)

where N is the number of nodes in the network, and xmi isthe ith component of the mth pattern of a sequence of p

patterns ðm ¼ 1; . . . ; pÞ. These rules are consistent withEq. (3) for zero initial weights and an appropriate choice oflearning rates and training repetitions.Another form of learning sequence relations, which

generalizes the basic form of Eq. (5), is Hebbian tracelearning. In this approach a temporal average (trace) ofnode activity,

rjðtÞ ¼

Z 10

GðxÞrjðt� xÞdx, (6)

is used in the learning rule, for example

DwHij / rirj. (7)

G is a kernel function which specifies how the firing historyis to be sampled, and has the property that

R10 GðxÞdx ¼ 1.

For example, the delta function, which is 0 for all x exceptfor an infinite peak of area 1 at x ¼ k, where k is thepresentation time of the previous pattern, leads to thetraining rule of Eq. (5). This could be implemented inthe brain by slow synapses. In contrast, fast synapseswould lead to an auto-associative rule similar to Eq. (4)(see [7] for a discussion of the physiology underlying thesemechanisms).Both auto-associative and hetero-associative connec-

tions are necessary to achieve robust sequence memory.This conjecture is based on the argument that the auto-associative weights are important for pattern completionthat enables the recall of items with partial cues in noisyenvironments, whereas the hetero-associations drive thestate of the system from one pattern to the next. Forexample, Hopfield [6] considered a model

tduiðtÞ

dt¼ �ui þ

Xj

wAij rjðtÞ þ l

Xj

wHij rjðtÞ. (8)

However, in simulations the result is that, when l is toosmall the network makes no transitions between patterns atall, usually attracting to the pattern closest to the initiationstate rð0Þ. As l gets larger the network tends to attract to alater pattern in the sequence before it has come to fullyrepresent the current pattern in the sequence, causing it tooverlap a number of consecutive patterns and consequentlylose the sequence entirely. Accurate sequence generationusing this approach has only been successfully demon-strated for sequences of length 4 [6,11].To solve this problem it is sufficient to allow some delay

between the time a new state is entered and when atransition from that state to the next is induced. This can

ARTICLE IN PRESS

wAA

wBBwAB

wBA

InputPathway

Module A Module B

Fig. 1. The proposed network, a pair of connected cortical modules A and

B. wAA, wBB, wAB and wBA represent the synaptic strengths in the recurrent

connections in modules A and B, and the inter-module connections from

B to A and from A to B, respectively. Lisman’s description [10] is similarly

pictured with wBB trained hetero-associatively (Eq. (5)), and all other

weights set auto-associatively (Eq. (4)). In our model wAB is trained

hetero-associatively (Eq. (5)), and all other weights auto-associatively

(Eq. (4)).

M. Lawrence et al. / Neurocomputing 69 (2006) 634–641636

be achieved using an update rule

tduiðtÞ

dt¼ �ui þ

Xj

wAij rjðtÞ þ l

Xj

wHij rjðtÞ, (9)

where the quantity rjðtÞ is again a convolution of the firinghistory as noted in Eq. (6), although this time the trace isused during recall and not during learning as in Eq. (7).Kleinfeld [8] uses a delta function dðx� kÞ, representingslow hetero-associative synapses with a delay of k. Tankand Hopfield [16] use an exponential kernel functionsimilar to an alpha function with a peak at x ¼ k for somek40 causing the firing rate history at time t� k to bestrongest, representing a combination of short-termmemory effects and delayed synapses.

Another related approach is to replace l in Eq. (1)with lðtÞ, allowing the strength of the hetero-associativesynapses to vary explicitly over time. In this way stronghetero-association can be used to initiate a transitionto the next pattern in the sequence, followed by strongauto-association for developing a clear representation ofthat pattern. Additional functionality over the kernelfunction approach is that it allows adjustment of theduration for which each pattern in the sequence ispresent. This is the approach used by Wang in [18], whoalternates between phases of hetero-association andauto-association in the connections between a pair ofrecurrent networks in order to accurately recall a simplesequence of pairs of patterns. In the brain, a time-dependent l may represent the effect of modulatorysynapses from another population of neurons which actsas a controller during the recall process. However, theactivity of this population must be carefully considered andmodelled if the goal is to explain sequence recall inanimals—one cannot simply choose any lðtÞ which workswithout considering how the dynamics might be achievedin the brain.

Hetero-associative continuous attractor networks havebeen studied by Stringer et al. to solve path-integration [15]and in conjunction with learning motor primitives andmotor control [14]. During learning, these models associatethe movement of the activity packet in the continuousattractor network over movement-indicator cells corre-sponding to the lambda values in the formulations above.This is an example where the hetero-associative weights aremodulated by the firing of other neuronal groups. Paper[14] gives an example of using this strategy in a modularnetwork and also demonstrates that complex sequences canbe handled in this approach.

3. The models

We study a modular architecture of two connectedrecurrent networks as shown in Fig. 1. The modules arelabelled A and B, respectively. The activities of nodesin modules A and B are governed by leaky-integrator

dynamics

thA

i ðtÞ

dt¼ �hA

i ðtÞ þ lAAXN

j¼1

wAAij rAj ðtÞ þ lAB

XN

j¼1

wABij rBj ðtÞ,

(10)

thB

i ðtÞ

dt¼ �hB

i ðtÞ þ lBBXN

j¼1

wBBij rBj ðtÞ þ lBA

XN

j¼1

wBAij rAj ðtÞ,

(11)

rAi ðtÞ ¼ tanhðhAi ðtÞÞ, (12)

rBi ðtÞ ¼ tanhðhBi ðtÞÞ. (13)

Within this architecture we consider two principal modelswith different placements of auto- and hetero-associativeweight matrices.

Model 1:
All weights are auto-associative except theweights in module B which are hetero-associa-tive. This model corresponds to an auto-associa-tive coupling of an auto-associative memory witha hetero-associative memory.
Model 2:
All weights are auto-associative except theweights from module B to module A which arehetero-associative. Thus, model 2 couples twoauto-associative memories with hetero-associa-tions in one direction and auto-associations inthe other direction.
Model 1 is reminiscent of the cortico-hippocampalarchitecture of Jensen and Lisman [7] and the hippocampalmodel of Lisman [10]. For example, in [10] module A

ARTICLE IN PRESS

Module B

Module A

1

2

3

4

1

2

3

4

time

itemnumber

Fig. 2. Learning of intra-modular connections in model 2. A sequence of

four patterns is presented to both modules. The sequence of module B is

delayed with respect to module A. Learning of connections from A to B

and B to A are interleaved, resulting in auto-associative connections from

A to B (long arrows) and hetero-associations from B to A (short arrows).

M. Lawrence et al. / Neurocomputing 69 (2006) 634–641 637

corresponds to a recurrent network formed by dentateneurons contacting mossy cells that feed back to dentateneurons, while module B would correspond to a CA3network. Lisman states that the process of sequence recallwould work as follows. Upon external stimulation to thefirst pattern x1 in the sequence, the recurrent hetero-associative connections in the CA3 would tend to make itconverge towards a noisy version x2

0of the next pattern x2

in the sequence. The connections from the CA3 would tendto excite the pattern x2

0in the dentate, which would then

use auto-association to reduce the noise and produce x2.The pattern x2 would then be transmitted back to the CA3,which would again tend to move towards the next patternin the sequence using its hetero-associative connections.

The intuition of model 2 during recall is that eachmodule works somewhat independently, cleaning up itsown representation of the current pattern in the sequencexm. When module B has a ‘‘clean enough’’ representation ofxm, it can begin to push module A toward the next patternin the sequence, xmþ1, due to hetero-associations. In thisway a natural timing can be achieved from the competitionbetween the two modules:

(1)
When both modules represent the same pattern, xm,there is competition between the recurrent synapses ofmodule A, which want to continue attracting towardxm, and between the inter-module synapses from B to A,which want to push A toward xmþ1.
(2)
When A and B represent xmþ1 and xm, respectively, thereis a competition between the recurrent synapses ofmodule B, which want to continue attracting towardxm, and the inter-module synapses from A to B, whichwant to push B toward xmþ1.
The temporal behaviour is a consequence of the growthof these forces and their necessary alternation, in that,when A’s strength is at a maximum B should be in atransition and hence its strength is minimized, and viceversa for when B’s strength is at a maximum. For examplein case 1 above, the strength of B pushing A toward xmþ1 isrelated to how clearly xm is represented in B, i.e. how closerBðtÞ, the firing rates of nodes in module B at time t, are toxm. As module B draws closer to xm, eventually its strengthin moving A toward xmþ1 becomes large enough to push Ainto the basin of attraction for pattern xmþ1. At this point Ahas a small strength in affecting B as it is far away from,but moving toward xmþ1. As A draws closer to thisattractor its strength in pushing B toward xmþ1 increasesuntil this occurs, at which point B’s strength becomesminimized during its transition. To achieve these dynamicsthe inter-module synapses must be weighted higher thanthe recurrent synapses, so that the forces from module Bare sufficient to affect the course of module A when B issufficiently close to one of the stored patterns and vice-versa. The time for which each pattern is stable can hencebe adjusted with the strength of the inter-module connec-tions. When they are made stronger, B has more effect on

A (and vice versa), moving it from a stable pattern morequickly and resulting in a smaller amount of time in whicha pattern is represented. Similarly, weaker inter-modulestrengths will result in slower transitions between patterns,or none at all when set too weak.Although the placement of different connection types

is easy to implement in simulations, it is important tonote that their biological realization and interpretationis somewhat different in models 1 and 2. In model 1 weassume that during learning the same pattern is presentedsimultaneously to both modules and that the time course ofplasticity is sufficiently fast in most pathways to result inauto-associative learning. Only the plasticity of intra-modular connections in module B need to be based onion channels with receptors that have a slow activationkinetics (like NMDAR in CA3, see discussions in [7]) toenable hetero-associative learning in the same module. Thisis different in model 2 where hetero-associative learningbetween modules does not require receptors that have aslow activation kinetics. For example, it is possible topresent the time series of patterns with a fixed offset in thedifferent modules as shown in Fig. 2 which only requires afixed delay in the external pathways to the differentmodules. With a further assumption that intra-modularplasticity is modulated to specific time windows asillustrated by the arrows in Fig. 2 we arrive naturally atmodel 2.

4. Experimental results

All models were tested with a basic sequence recallexperiment in which we trained the network with asequence of p patterns and tried to adjust the relativestrength of the different pathways (ls) to enable correctrecall of the sequence. We also verified the stability ofsequence recall by using a noisy initial pattern and noisytransmission, and explored the limits of the sequencelength when stable solutions were found. Each pattern xm is

ARTICLE IN PRESSM. Lawrence et al. / Neurocomputing 69 (2006) 634–641638

a string of binary numbers representing node activities, i.e.xm 2 f�1; 1gN where N is the number of nodes. Thepatterns are uniformly distributed so that the mean nodeactivity is zero, allowing us to use the simplified learningrules of Eqs. (4 and 5. In our hetero-associative training weassociate x1 from xp, so that the sequence is cyclic.Sequence recall is illustrated in the following by plottingthe inner product between the network states and thetraining patterns.

4.1. Model 1

With model 1 we were not able to find appropriateparameters to get consistent results. An example ofsequence recall using model 1 showing some transitions isshown in Fig. 3. After extensive experimentation, this is anexample of the best progress towards a correctly recalledsequence that we were able to achieve. Modules of N ¼

1000 nodes were used, and the model was trained on shorta sequence of only six patterns. Experiments on longersequences failed similarly. Both modules were initialized tothe first pattern without noise. Initializing the modules withdifferent patterns increased the difficulty in findingsatisfactory ls. Decreasing the number of nodes andpatterns adds some stability to the sequences, but thismay be an artefact of small networks. We found thattypically module B either attracts quickly to some stablefixed point which does not represent any of the storedpatterns, or as lBB is increased, experiences very rapidoscillations. module A meanwhile usually attracts to a fixed

0 2 4 6 8 10−0.5

0

0.5

1

Time [tau]

Ove

rlap

in A

0 2 4 6 8 10−0.5

0

0.5

1

Time [tau]

Ove

rlap

in B

Pattern 1Pattern 2

Pattern 3Pattern 4

Pattern 5Pattern 6

Fig. 3. Result of recall of a learned sequence of six random patterns using

model 1 with N ¼ 1000 nodes. The parameters used were lAA¼ 1,

lBB ¼ 2:2, lBA ¼ 2, lAB¼ 4. The simulation was performed in the same

way as that of Fig. 4. Both modules were initialized with the first pattern

without noise. The result shows the pattern oscillations of module B due to

its hetero-associative synapses (bottom), and the oscillations of module A

(top) due to (selective) interaction from B.

point, or will make transitions if B is oscillating and lAB islarge enough. However, A does not ever converge to apattern in this case, and its transitions do not recall thepatterns in order.The problem with tuning the ls is as follows: increasing

lAB will give a greater chance that A can be brought intothe basin of attraction of a particular pattern in order toclean up its representation of this pattern, but as Bcontinues to oscillate this brings a greater chance that Awill be disrupted from its process of cleaning up thispattern. Thus the network does not quite behave accordingto the intuitive notion of stages of hetero-associationfollowed by cleaning of the signal, since the activities of Bare not stable enough to affect A in a predictable manner.These results reflect the known difficulties of previous

simulations with single module networks [6]. These net-works have separate hetero- and auto-associative connec-tions but constrain the response to input from theseconnections by having only one class of neurons. Theresults here show that removing this constraint does notimprove the recognition ability of the network. Also, whilemodel 1 reflects the principal architecture of the modelsused by Jensen and Lisman [7,10], it is important to notethat the implementations in their work include synapticdynamics that can stabilize sequence recall in suchnetworks as discussed in Section 3.

4.2. Model 2

Fig. 4 shows the result of recall of a sequence of 20random patterns using model 2 with 1000 nodes in eachmodule. It is easy to find strength parameters which lead toreliable sequence recall. The values for the example shownin Fig. 4 are lBB ¼ lAA

¼ lBA ¼ 1; lAB¼ 2. The modules

were initialized with a noisy version of pattern x1 in moduleA and a random pattern module B. The overlap betweenthe network states and the stored patterns, rAðtÞ0xm=N andrBðtÞ0xm=N was measured, but is only shown for module A(top figure) since the overlap of module B is very similar,being only a slightly shifted version of the plot of moduleA. It can be seen from Fig. 4(A) that each pattern isperfectly recalled and is stable for some short period. Sinceit is difficult to distinguish between the 20 different patternsin Fig. 4 (A) even if different line styles were used, the indexm of the pattern for which rAðtÞ0xm=N is largest is plotted foreach time in Fig. 4(B), where it can be seen that thepatterns are indeed recalled in the correct order. It wasalso possible to get a correct and stable sequence recall with50 patterns in a network with 2000 nodes in each module(Fig. 4(C)).Since synaptic transmission is not perfect, and nodes

may fire with less input than normal, or may not fire whengiven enough input, we also conducted experiments wherethe transmission of signals is noisy. This was done bynegating a certain percentage of the ri values in Eq. (10) ateach time step during recall. We found that reliable recallof 20 random patterns in a network of N ¼ 1000 nodes can

ARTICLE IN PRESS

0 10 20 30 40 50 60 70 80 90 1000

4

8

12

14

18

Pat

tern

Num

ber

Time

0 10 20 30 40 50 60 70 80 90 100

Time

Ove

rlap

1.0

0.5

0

0 100 200 300 400 500

20

50

Pat

tern

Num

ber

Time

(A)

(C)

(B)

Fig. 4. ðAþ BÞ Result of recall of a learned sequence of 20 random

patterns using model 1 with N ¼ 1000 nodes. The inter-module synapses

from A to B were given a weight of lBA ¼ 1, the same as the intra-module

(recurrent) synapses, and the inter-module synapses from B to A were

given a weight of lAB¼ 2 times larger. Sequence recall was initiated by

setting the firing rates of nodes in module A to the first pattern with 30%

noise, and the firing rates of nodes in module B to random values. (A) The

overlap between the firing rates rAðtÞ of the nodes in module A and each of

the stored patterns at each time. (B) The pattern which has the highest

overlap with the firing rates of module A at each time. (C) Plot similar to

(B) for a network of 2000 nodes in each module that is trained on a

sequence of 50 patterns.

0 2 4 6 8 10 12 14 16 18 20−1

0

1

Time [tau]

Ove

rlap

in A

0 2 4 6 8 10 12 14 16 18 20−1

0

1

Time [tau]

Ove

rlap

in A

Fig. 5. Overlap of the firing rates in module A with each of pattern during

recall of a length ten sequence. N ¼ 1000 nodes were used in both cases,

with lBB ¼ lAA¼ 1 and ( top) lAB

¼ 2, lBA ¼ 1 and (bottom) lAB¼ 2:1,

lBA ¼ 1:6. The period for which each pattern is stable is decreased with

higher inter-module synaptic strengths.

0 2 4 6 8 10 12 14 16 18 20−1000

0

1000

2000

3000

Ove

rlap

in A

0 2 4 6 8 10 12 14 16 18 20−500

0

500

1000

1500

2000

Time [tau]

Ove

rlap

in B

Fig. 6. Overlap of the net activation hðtÞ of nodes in each module with the

stored patterns in a sequence of length 10 during recall. Here the dynamics

of the interaction between the modules A and B can clearly be seen. As the

activation overlap rises for a particular pattern in B (bottom), A begins

moving toward the next pattern in the sequence (top) due to the B!A

hetero-associations. As the activation overlap rises for a particular pattern

in A, B follows shortly behind from the A!B auto-associations. Note that

because there are only four available line styles, some patterns (not

adjacent in the sequence) share the same line style.

M. Lawrence et al. / Neurocomputing 69 (2006) 634–641 639

be performed with up to 30% noise. With 40% noiseperfect recall sometimes occurs, but is often disrupted. Thenetwork still performs sequence recall to a degree, but doesnot recall any pattern perfectly and often jumps betweendifferent segments of the sequence.

Fig. 5 shows how the speed of recall can be increased byincreasing the intra-module synaptic strengths lAB andlBA. The top figure shows the recall of a sequence of 10patterns with the same parameters as in Fig. 4, while in thebottom figure lBA is increased from 1 to 1:6, and lAB isincreased from 2 to 2.1. Increasing lAB any further causesA to be preemptively disrupted from its auto-associativerecall of pattern xm for recall of pattern xmþ1. The patternsare still recalled in order, but each pattern is not fullyrecalled. Increasing all ls has little effect on the sequencerecall, it is the difference in strength between the inter- andintra-module connections which is the major factor inrecall speed.

Fig. 6 shows the overlap of the net activation (hAðtÞ andhBðtÞ) of nodes in both modules with the patterns in asequence of length 10 during recall. Here the interactionbetween the two modules can clearly be seen. For example

ARTICLE IN PRESSM. Lawrence et al. / Neurocomputing 69 (2006) 634–641640

at time t ¼ 2, B is attracting towards the pattern indicatedby the dotted line, causing A to move away from thispattern and attract to the next pattern in the sequence,indicated by the dot-dashed line. As this happens B followsclosely behind, causing A to once again transition to thenext pattern in the sequence.

5. Conclusions and future work

Multi-modular approaches to sequence generation havethe property that the timing of transitions between items ina sequence is a product of the dynamics of the interactionbetween different modules. This is in contrast to ap-proaches involving a single recurrent network, whichrequire an explicit temporal function which must bechosen ahead of time. Recall speed in a single networkcan be changed by adjusting the parameters of the chosentemporal functions, which implies an adjustability of thesynaptic activation dynamics. The speed of recall in themodular approach studied here can be altered bymodulating the overall strength between the modules.

The model presented in this paper that has hetero-associative connections between the auto-associative mod-ules is able to learn and generate simple sequences ofconsiderable length easily and reliably, even in the presenceof noise. This is in contrast to an architecture that connectsa hetero-associative module with an auto-associativemodule. Our architecture permits storage of simplesequences with arbitrary and varying time scales; it doesnot require combining a trace of previous activity in arecurrent network with new activity, as the pattern indifferent networks can be timed differently and learningoccurs between these patterns.

In this paper we only consider simple sequences, but weexplored some extension of the models for complexsequences. For example, in one extension we implementeda trace rule as in Eq. (6) using an exponentially weightedmoving average, i.e. our hetero-association in Eq. (5)becomes

wABij ¼

1

N

Xm

xmþ1i xmj ,

with xmj being the exponentially weighted moving averageof all previous activity of node j

xmj ¼ ð1� ZÞxmj þ Zxm�1j . (14)

During simulation we then compute a trace of previousnode activity

rBj ðtÞ ¼ ð1� gÞrBj ðtÞ þ grBj ðt� dtÞ

as activity of module B in Eq. (10) , where dt is the timestep of the simulation. However, we were not able to findvalues for g and Z to recall sequences when differentpatterns in the sequence were more than 70% similar.

Another approach that we explored was to store witheach pattern some information which amounts to the

‘‘context’’ of that pattern. For example if the firing rates ofa small group of NP nodes represented the actual pattern,and the other NC ¼ N �NP represented the context,complex sequence recall should be possible if the differentcontexts of a repeated pattern are unique. However, inexperiments with N ¼ 1000 nodes we were only able torecall complex sequence in this approach for considerablelarge context such as NC ¼ 800 nodes. Thus, reliablestorage and recall of complex sequences is still a majorchallenge. Another direction for further research is thelearning of sequences whose items do not occur at regularintervals.The types of sequences this paper is concerned with are

of considerable length (larger than 7) and have variabletime frames. This is in contrast to the sequence memorydiscussed by Jensen and Lisman and others in conjunctionwith hippocampal functions. Sequence learning has alsobeen discussed in conjunction with the basal ganglia in theframework of reward and temporal difference learning [20].It is likely that several mechanisms and different rolesof temporal sequences processing are utilized in the brain,and it is necessary to distinguish these different forms ofsequence processing and their mechanistic realization inmore detail.

References

[1] M. Abeles, Corticonics: Neural Circuits of the Cerebral Cortex,

Cambridge University Press, Cambridge, UK, 1991.

[2] E. Bienenstock, A model of neocortex, Network: Comput. Neural

Syst. 6 (1995) 179–224.

[3] S. Grossberg, Some physiological and biochemical consequences of

psychological postulates, Proc. Nat. Acad. Sci. USA 60 (3) (1968)

758–765.

[4] S. Grossberg, Some networks that can learn, remember and

reproduce any number of complicated space–time patterns, J. Math.

Mech. 19 (1) (1969) 53–91.

[5] H. Gutfreund, M. Mezard, Processing of temporal sequences in

neural networks, Phys. Rev. Lett. 61 (1988) 235–238.

[6] J.J. Hopfield, Neural networks and physical systems with emergent

collective computational abilities, Proc. Nat. Acad. Sci. 79 (1982)

2554–2558.

[7] O. Jensen, J.E. Lisman, Theta/gamma networks with slow NMDA

channels learn sequences and encode episodic memory: role of

NMDA channels, Learning and Memory 3 (1996) 264–278.

[8] D. Kleinfeld, Sequential state generation by model neural networks,

Proc. Nat. Acad. Sci. 83 (1986) 9469–9473.

[9] W. Levy, A computational approach to hippocampal function, in:

Computational Models of Learning in Simple Neural Systems,

Academic Press, New York, 1989, pp. 243–305.

[10] J. Lisman, Relating hippocampal circuitry to function: recall of

memory sequences by reciprocal dentate—CA3 interactions, Neuron

22 (2) (1999) 233–242.

[11] H. Nishimori, T. Nakamura, M. Shiino, Retrieval of spatio-temporal

sequence in asynchronous neural network, Phys. Rev. A 41 (1990)

3346–3354.

[12] F.T. Sommer, T. Wennekers, Synfire chains with conductance-based

neurons: internal timing and coordination with timed input,

Neurocomputing 65–66 (2005) 449–454.

[13] H. Sompolinsky, I. Kanter, Temporal association in asymmetric

neural networks, Phys. Rev. Lett. 57 (1986) 2861–2864.

ARTICLE IN PRESSM. Lawrence et al. / Neurocomputing 69 (2006) 634–641 641

[14] S.M. Stringer, E.T. Rolls, T.P. Trappenberg, I.E.T. de Araujo, Self-

organizing continuous attractor networks and motor function,

Neural Networks 16 (2) (2003) 161–182.

[15] S.M. Stringer, T.P. Trappenberg, E.T. Rolls, I.E.T. de Araujo, Self-

organising continuous attractor networks and path integration: one-

dimensional models of head direction cells, Network: Comput.

Neural Syst. 13 (2002) 217–242.

[16] D.W. Tank, J.J. Hopfield, Neural computation by concentrating

information in time, Proc. Nat. Acad. Sci. 84 (1987) 1896–1900.

[17] H. Wagatsuma, Y. Yamaguchi, Cognitive map formation through

sequence encoding by theta phase precession, Neural Comput. 16 (12)

(2004) 2665–2697.

[18] L. Wang, Heteroassociations of spatio-temporal sequences with the

bidirectional associative memory, IEEE Trans. Neural Networks 11

(6) (2000) 1503–1505.

[19] A.B.H. William, B. Levy, X. Wu, Interpreting hippocampal function

as recoding and forecasting, Neural Networks 18 (2005) 1242–1264.

[20] F. Worgotter, B. Porr, Temporal sequence learning, prediction, and

control: a review of different models and their relation to biological

mechanisms, Neural Comput. 17 (2005) 245–319.

Michael Lawrence received a BCSc degree from

Dalhousie University in Halifax, Canada in 2004.

He is currently an NSERC Canada Graduate

Scholar working towards a MCSc degree from

Dalhousie University. His research interests

include approximation algorithms and combina-

torial optimization, data warehousing, on-line

analytical processing and distributed computing.

Thomas Trappenberg studied physics at RWTH

Aachen University and held research positions

at KFA/HLRZ Julich, the RIKEN Brain

Science Institute and Oxford University. He is

currently associate professor in the Faculty of

Computer Science at Dalhousie University. His

principal research interests are computational

neuroscience and applications of machine learn-

ing methods for data classification and signal

analysis. He is the author of the textbook

‘Fundamentals of Computational Neuroscience’ published by Oxford

University Press.

Alan Fine studied philosophy and biology at

Harvard University, and obtained a veterinary

MD and Ph.D. in physiology at the University of

Pennsylvania. He is Team Leader in the Division

of Neurophysiology at the National Institute for

Medical Research in London, and Professor of

Physiology and Biophysics at Dalhousie Univer-

sity. His research deals with mechanisms of

synaptic transmission and learning.

Documents

Rapid learning and robust recall of long sequences in modular associator networks