Novelty-dependent learning and topological mapping

Novelty-dependent learning and topological mapping

R. Hans Phaf Paul Den Dulk Adriaan Tijsseling Ed Lebert

Abstract

Unsupervised topological ordering, similar to Kohonen’s (1982) Self-organizing feature

map, was achieved in a connectionist module for competitive learning (a CALM Map) by

internally regulating the learning rate and the size of the active neighborhood on the basis

of input novelty. In this module winner-take-all competition and the 'activity bubble' are

due to graded lateral inhibition between units. It tends to separate representations as far

apart as possible, which leads to interpolation abilities and an absence of catastrophic

interference when the interfering set of patterns forms an interpolated set of the initial

data set. More than the Kohonen maps, these maps provide an opportunity for building

psychologically and neurophysiologically motivated multimodular connectionist models.

As an example, the dual pathway connectionist model for fear conditioning by Armony,

Servan-Schreiber, Cohen, and LeDoux (1997) was rebuilt and extended with CALM

maps. If the detection of novelty enhances memory encoding in a canonical circuit, such

as the CALM map, this could explain the finding of large distributed networks for

novelty detection (e.g. Knight & Scabini, 1998) in the brain.

A self-organising connectionist map 2

2

1. Introduction

Novelty detection is increasingly recognized as a stimulating factor for memory

encoding, both in neurobiological research (e.g. Halgren & Smith, 1987; Knight, 1996;

Eichenbaum, 1999), and in psychological research (e.g. Phaf, 1994;Tulving & Kroll, 1995).

Implementations of this novelty-encoding process in connectionist models have, however,

been rare. Many competitive learning procedures implicitly distinguish novel from familiar

stimuli by the amount of competition they evoke. Familiar stimuli, generally, suffer from less

competition and have more localized representations (Page, 2000) than novel stimuli. The

classical self-organizing map by Kohonen (1982, 1988, 1995) even incorporates a steady

decline of learning rate with repeated presentation of the input patterns (i.e. when they

become more familiar). Familiarity is reflected in this map by an increased ordering of

representations, which should not be disrupted by recoding due to strong weight changes. A

competitive learning procedure which explicitly distinguishes novel from familiar stimuli, but

does not order representations in the same manner as Kohonen’s map, is the CALM module

(Murre, Phaf, & Wolters, 1992). The two approaches were combined in the CALM Map

which uses the novelty detection mechanism and the explicit lateral inhibition of CALM

modules to achieve topological mapping as in Kohonen’s map.

The CALM Map is introduced here, its behaviour is compared to that of CALM

modules, and an illustrative example of a network incorporating a number of CALM maps is

presented. An existing multimodular competitive network model is rebuilt and applied to the

simulation of experimental data. The assumptions underlying CALM Map, particularly

concerning novelty-dependent learning, may not only be justified from a practical point of

view, but may also be used to simulate and explain experimental results in the field of

memory, attention, perception, and even affective processes. Although we have found

CALM Maps quite useful in simulations of widely different behavioural data, the full

applicability will have to show in a prolonged use of the procedure, perhaps much in the

same manner as Kohonen’s map has been developed and shown its value over twenty years.

Instead of implicitly involving novelty in a model, such as in the Kohonen map, it

seems better to follow the developments in psychological theorising (Phaf, 1994; Tulving &

Kroll, 1995) and explicitly incorporate these in a model. In addition, these assumptions may

provide some opportunity to understand neural properties and may help to bridge the gap

between psychology and neurobiology. If, for instance, novelty detection plays a central role

https://www.researchgate.net/publication/258349952_Novelty_assessment_in_the_brain_Long-term_memory_encoding?el=1_x_8&enrichId=rgreq-58e8459b-76d7-4ca9-807d-4c2061dd9f1d&enrichSource=Y292ZXJQYWdlOzIyMDIzMzU4ODtBUzo5NzA5NjQ3MjQ2NTQwOUAxNDAwMTYxMDExMjc4

https://www.researchgate.net/publication/12031887_Connectionist_modeling_in_psychology_A_localist_manifesto?el=1_x_8&enrichId=rgreq-58e8459b-76d7-4ca9-807d-4c2061dd9f1d&enrichSource=Y292ZXJQYWdlOzIyMDIzMzU4ODtBUzo5NzA5NjQ3MjQ2NTQwOUAxNDAwMTYxMDExMjc4

https://www.researchgate.net/publication/33844722_Learning_in_natural_and_connectionist_systems_experiments_and_a_model?el=1_x_8&enrichId=rgreq-58e8459b-76d7-4ca9-807d-4c2061dd9f1d&enrichSource=Y292ZXJQYWdlOzIyMDIzMzU4ODtBUzo5NzA5NjQ3MjQ2NTQwOUAxNDAwMTYxMDExMjc4

https://www.researchgate.net/publication/226370746_Kohonen_T_Self-Organized_Formation_of_Topologically_Correct_Feature_Maps_Biological_Cybernetics_431_59-69?el=1_x_8&enrichId=rgreq-58e8459b-76d7-4ca9-807d-4c2061dd9f1d&enrichSource=Y292ZXJQYWdlOzIyMDIzMzU4ODtBUzo5NzA5NjQ3MjQ2NTQwOUAxNDAwMTYxMDExMjc4


3

in learning, it would be expected that large distributed networks in the brain would be

involved (e.g. Tulving, Markowitsch, Craik, Habib, & Houle, 1996; Knight, 1996, 1997;

Knight & Scabini, 1998). In this respect there seems to be a two-way interaction between

simulations of artificial neural networks and the study of real neural systems. On the one

hand microscopic neural functions may serve as inspiration for connectionist models and on

the other hand network models may impart meaning on, not yet fully understood, neural

mechanisms.

The CALM procedure, from Categorizing And Learning Module (Murre, Phaf, &

Wolters, 1992), has been proposed as a building block for modular network models. It

develops local representations (see Page, 2000) on specific nodes at a modular level, but semi-

distributed representations at a global network level (which is assumed to consist of many

interconnected modules). CALM implements competitive learning by lateral inhibition

between nodes and it incorporates a psychologically motivated, novelty dependent,

attentional mechanism which leads to a random search for possible representations and

increased learning of these representations (i.e. elaboration learning).

Disadvantages of this approach are that the learned representations do not match the

topology of the input space, and that increasing amounts of overlap between patterns may

severely impair categorisation performance. If a new pattern differs sufficiently from the

already represented patterns, a new representation is selected at random from the remaining

uncommitted nodes (i.e. nodes that do not have a representation). Though it shows some

ability to separate correlated patterns, its performance breaks down for highly correlated

patterns. Particularly in larger modules and with larger patterns, even a large distance between

patterns may not be sufficient for a stable distinct categorisation.

Such a problem is not found in the well-known self-organising procedure of Kohonen

(1982, 1988, 1995). In the initial version of these self-organising feature maps activations of

the representational nodes are determined by the Euclidean distance between the input vector

and the weight vector to the node. The computational procedure, which was only partly

implemented in neural network terms, selects the node with the highest activation together

with a neighbourhood of nodes (the 'activity bubble') for weight change. Neighbourhood size

and learning rate are reduced during successive learning of the patterns. In this manner, similar

patterns will get represented on neighbouring nodes, whereas dissimilar patterns remain far


4

apart. The preservation of the order in the input space by the representations of patterns is

generally referred to as topological self-organisation.

The CALM procedure provides a state-dependent mechanism for internally adjusting

the learning rate as a function of novelty, whereas these novelty-dependent changes in the

Kohonen map are set by the programmer. Following Murre (1992), we modified CALM to

stretch representations along the module according to the similarity gradient in the input by

introducing a gradient of lateral inhibition within a module. The formation of feature maps on

the basis of explicitly simulated inhibitory dynamics between nodes has previously been

studied by Miikulainen (1991), but this model still needed control of the 'activity bubble'

radius by hand during learning and was also quite sensitive to boundary conditions (see also

Murre, 1992).

The new map adheres to the general principles for interactive activation networks

described by McClelland (1993). He did, however, not present a general learning procedure

for his theoretical framework (called GRAIN, standing for Graded Random Adaptive

Interactive (nonlinear) Network). Although both types of CALM modules have additional

mechanisms not specified by McClelland (e.g. novelty-dependent learning), both modules

seem to qualify as learning procedures within such a framework. Probably due to the

difficulty of independently regulating the different maps in a multimodular network, and of

handling bi-directional connections, to our knowledge, Kohonen maps have not been applied

to this kind of, interactive, multimodular, networks. Kohonen maps, therefore, appear to be

less suitable for the types of interactive networks envisioned by McClelland (1993). Because

CALM Map is intended as a building block for such networks, we will focus here on the

comparison of CALM Maps and CALM modules and consider the Kohonen Maps as a

useful heuristic for improving CALM.

2. Competitive modules

In Kohonen's self-organizing feature map (Kohonen, 1982, 1988, 1995) nodes are

arranged according to some type of 1, 2, or 3 dimensional neighborhood of connectivity (e.g. a

line, grid, or cube). Starting from a random pattern of weights, input patterns are compared to

the weight vector associated with each node using an Euclidean distance measure, resulting in

the selection of a best-matching or "winning" node. Weights on links to nodes that are

https://www.researchgate.net/publication/226370746_Kohonen_T_Self-Organized_Formation_of_Topologically_Correct_Feature_Maps_Biological_Cybernetics_431_59-69?el=1_x_8&enrichId=rgreq-58e8459b-76d7-4ca9-807d-4c2061dd9f1d&enrichSource=Y292ZXJQYWdlOzIyMDIzMzU4ODtBUzo5NzA5NjQ3MjQ2NTQwOUAxNDAwMTYxMDExMjc4


5

neighbors of the winning node are also modified. As a result, similarity between input

patterns will be mapped into proximity of activated nodes, and representations will be forced

in an order depending on the dimension with the largest range of variation in the whole data

set. Small variations tend to be ignored or play only a minor role in the ordering process,

depending on the available representation space.

To obtain stabilization in the Kohonen map during learning, usually two parameters are

regulated externally. First, the neighbourhood size of activated nodes is reduced

monotonously with repeated presentation of the set of input patterns, and second, the

learning parameter is decreased gradually. This regulation requires prior knowledge about the

presentation schedules according to which the neighbourhood and the learning parameter have

to be adjusted. This may, however, be hard to reconcile with the unsupervised character of

learning in self-organising maps. It seems, therefore, desirable to implement processes capable

of automatically self-adjusting global network parameters such as neighbourhood size and

learning rate. CALM seems well suited for this purpose, because it already incorporates a

novelty dependent learning rate, which automatically decreases when competition in the

module levels off (i.e. when patterns become more familiar and have a better match with

stored representations). If, moreover, competition is implemented by a gradient of inhibitory

connections, the size of the winning (i.e. activated) neighbourhood of nodes no longer needs to

be specified externally, but will be dependent on the range of activations in the module. The

neighbourhood will be tuned more finely during learning due to the increasing match between

weight pattern and input pattern. So, both the winner-take-all behaviour and the 'activity

bubble' arise from the incorporation of actual inhibitory connections.

A standard CALM (see Figure 1) is a competitive learning module, in which the

competition process is performed by intramodular interactions between excitatory

Representation nodes (R-nodes) and inhibitory Veto nodes (V-nodes). Nodes with excitatory

and inhibitory effects have been explicitly separated in the module. Every R-node has an

excitatory connection to a single (matched) V-node. The strongly inhibitory weights from V-

nodes to all other (non-matched) R-nodes have an equal value and impose a strong veto effect

on these R-nodes. Incoming signals arrive along modifiable intermodular connections at the R-

nodes. An Arousal node (A-node), receiving connections from both R and V-nodes, weighs

the amount of competition, which serves as a measure of the novelty of the presented

pattern. When an input pattern closely resembles the weight pattern to a particular R-node,

there will be little competition and the total amount of excitation from the R-nodes to the A-


6

node will be suppressed by the inhibition from the V-nodes. When many R-nodes have

weight patterns that match the input pattern about equally well, there will be much

competition and due to the inhibition between V-nodes, the A-node will receive more

excitation from R-nodes than inhibition from V-nodes. The A-node activates an External node

(E-node), which spreads random activations among the R-nodes, and controls the learning rate

in the module. In the case of much competition, the E-node will be highly active and will

generate relatively large random pulses to prevent potential "deadlocks" in the competition.

This is not necessary, of course, when there is little competition, because a winning pair of R

and V-nodes has already been selected.

V V

R

V

RR

A

E

Low

HighFlat

Strange

AE

Up Cross

Modifiable intermodular connections

Down

Figure 1. A CALM module with node types and connection names. The node types are V-node (Vetonode), R-node (Representation node), A-node (Arousal node), and E-node (External node. Forconvenience, also distinctive names were given to the connections. An excitatory Up weight connectsa R-node to its matched V-node. In the standard CALM module all inhibitory Cross weights (from aV-node to non-matched R-nodes) are equal, and the Down weight (to the matched R-node) issomewhat higher. The A-node receives activations from both R and V-nodes via Low and Highconnections, respectively. The AE weight connects the A-node to the E-node, which sends randomactivations through Strange connections to the R-nodes. Only intermodular connections aremodifiable.

The general equation specifying the activation states (real values between 0 and 1) of

node i at epoch (or iteration number) (t+1) in CALM is:

€

ai ( t +1) = (1− k)a i ( t)+ei

1+ ei[1− (1− k)a i ( t)]

(1)


7

if the input ei > 0, and

€

ai ( t +1) = (1− k)a i ( t)+ei

1− ei(1− k)ai ( t) (2)

if the input ei < 0 where ei is the match between inputs and the modifiable intermodular

weights determined by the inner product of both vectors, and k is a decay parameter.

In these equations three components may be distinguished. The first component (1 -

k)ai(t), represents autonomous decay, and for ei > 0 the second part ei /1 + ei is (half of) a

sigmoid function between zero and one. The third part of the rule [1 - (1 - k)ai(t)] ensures that

the increase in activation due to net excitatory input approaches the maximum activation

asymptotically. Similarly, for ei < 0, ei /1 - ei squashes the negative excitation (inhibition)

between minus one and zero. The (1 - k)ai(t) component then ensures an asymptotic

approach to the minimum activation value. It should be noted that in CALM, contrary to the

Kohonen map, activations are not determined by an Euclidean distance measure but by a

shunting equation (Equations 1 and 2, see also Grossberg, l973, 1988), where the input term ei

is determined by the more common weighted summation rule.

The modification of the intermodular weight from node j to node i is governed by an

extension of a learning rule published by Grossberg (1982):

€

∆wij (t +1) = µ( t)ai ( t) [k − wij ( t)]a j − Lwij( t) wif a ff ≠ j

∑

(3)

where µ t( )

is a parameter which controls learning speed. The value of this parameter

depends on the activation of the E-node according to:

( ) EE awdt µµ += (4)

where d is a constant with a small value determining base rate learning, wµE is a virtual

weight from E to µ t( ) (from the E-node to the learning parameter), and aE is the activation of

the E-node. The E-node also sends random activation pulses, which are uniformly distributed

over the interval 0,aE t( )[ ] , to all R-nodes in the module. aE t( ) represents the activation of

the E-node at time t. Because all intermodular weights start out at exactly the same value, the

random activations are required to break the symmetry, but are also useful in later phases

https://www.researchgate.net/publication/223639648_Nonlinear_Neural_Networks_Principles_Mechanisms_and_Architectures?el=1_x_8&enrichId=rgreq-58e8459b-76d7-4ca9-807d-4c2061dd9f1d&enrichSource=Y292ZXJQYWdlOzIyMDIzMzU4ODtBUzo5NzA5NjQ3MjQ2NTQwOUAxNDAwMTYxMDExMjc4


8

when there may be many nearly-matching nodes. A more extensive description of CALM can

be found elsewhere (Murre, 1992; Murre et al., 1992; Phaf, 1994).

CALM incorporates two modes of learning: elaboration learning and activation learning.

Activation learning represents base-line learning (i.e. slow strengthening of existing

associations) on which elaboration learning is superimposed. The elaboration process is

dependent upon the amount of competition among the R-V node pairs. If a pattern is not yet

represented in the module, it will generally elicit much competition, because many nodes are

simultaneously activated by the pattern. This gives rise to a high arousal level at the A-node

and the E-node, yielding an increased learning rate, and relatively large random pulses

facilitating the resolution of competition. A well-established pattern activates its

corresponding node without much competition and only strengthens its representation

through activation learning, which is characterised by a relatively low learning rate. Learning in

CALM, thus, has the effect of reducing the competition with repeated presentation of the

pattern set, whereby elaboration learning is gradually replaced by activation learning.

The implementation of this novelty-dependent modulation of learning was primarily

motivated by a psychological theory of memory and learning (Mandler, l980; see also Murre

et al, 1992; Phaf, 1994), but increasingly seems to receive support from neurobiological

research (Halgren & Smith, 1987; Knight, 1996; Eichenbaum, 1999). Dual process theory (i.e.

elaboration and activation learning) was first used to explain results from recognition

experiments and which was later applied to dissociations between implicit (e.g. threshold

identification, word stem completion) and explicit (e.g. free recall, recognition) memory

performance (Graf and Mandler, 1984; see also Bower, 1996). An example of such a

dissociation is that implicit memory performance is generally preserved in severely

anterograde amnesic patients, whereas explicit memory performance is often completely

absent. A similar dissociation can sometimes be observed in normal subjects when the to-be-

remembered material is presented outside of attention (e.g. in a divided attention task). This

dissociation can be accounted for by assuming that elaboration learning has been impaired in

these patients (and is dependent on attention in normal subjects), but that activation learning

still serves to consolidate existing representations. A simulation of this dissociation has been

performed in a multimodular CALM network (Phaf, 1994) by disabling elaboration learning

(lesioning the connection to the External node). After this lesion, the network lost its ability

to form new representations (in the short time available during single trial presentation), but

still revealed consolidation of existing representations.

https://www.researchgate.net/publication/12902806_The_hippocampus_The_shock_of_the_new?el=1_x_8&enrichId=rgreq-58e8459b-76d7-4ca9-807d-4c2061dd9f1d&enrichSource=Y292ZXJQYWdlOzIyMDIzMzU4ODtBUzo5NzA5NjQ3MjQ2NTQwOUAxNDAwMTYxMDExMjc4

https://www.researchgate.net/publication/14398536_Knight_R_T_Contribution_of_human_hippocampal_region_to_novelty_detection_Nature_383_256-259?el=1_x_8&enrichId=rgreq-58e8459b-76d7-4ca9-807d-4c2061dd9f1d&enrichSource=Y292ZXJQYWdlOzIyMDIzMzU4ODtBUzo5NzA5NjQ3MjQ2NTQwOUAxNDAwMTYxMDExMjc4

https://www.researchgate.net/publication/14468437_Reactivating_a_Reactivation_Theory_of_Implicit_Memory?el=1_x_8&enrichId=rgreq-58e8459b-76d7-4ca9-807d-4c2061dd9f1d&enrichSource=Y292ZXJQYWdlOzIyMDIzMzU4ODtBUzo5NzA5NjQ3MjQ2NTQwOUAxNDAwMTYxMDExMjc4




9

3. CALM Map

An early approach (CALSOM; Murre, 1992) to implement a neighborhood of activated

nodes in CALM was to have a linearly decreasing gradient of inhibition with distance from

the inhibiting node. Murre, however, did not obtain maximal separation of representations.

Adjacent input patterns were sometimes represented on the same node and boundary nodes

tended not to be occupied by particular patterns. A further problem was that, though a

topological ordering was achieved, subgroups were sometimes inverted (i.e. 'twists'). CALM

Maps differ from CALSOM due to the incorporation of a convex inhibition gradient (i.e. part

of a Gaussian function), instead of a linear inhibition gradient (or a 'Mexican hat' gradient, see

Miikulainen, 1991). For this gradient it has been shown in terms of Kohonen Maps that,

when the ‘full width at half height’ of the Gaussian equals the number of neurons,

convergence is optimal (Erwin, Obermayer, & Schulten, 1992). Only one-dimensional

topologies (e.g. a line or a ring) are considered here. Though there are ways of circumventing

boundary problems in a line topology (i.e. by reducing the net inhibition to the boundary

nodes), we have chosen to avoid the problem by eliminating boundaries altogether (i.e. in a

ring topology in which the ‘first’ node is a neighbor of the ‘last’ node in a module).

The parameter values (mostly fixed intramodular weight values) were generally the

same as in the standard CALM module (Murre et al., 1992; see also Table 1). Simulations

have shown that these parameters can be varied over large ranges to preserve global behaviour

of CALM Maps and modules. Though this set was chosen after some preliminary

simulations, it cannot be excluded that better values can be obtained. The parameter values

have to obey some global rules. To enable a transition from elaboration to activation learning,

for instance, the inhibitory weights from the V-nodes to the A-node have to be larger in

absolute value than the excitatory weights from the R-nodes to the A-node. Because there is

generally more than one node active in the activity bubble, the weights to the A-node in the

CALM Map had to be adjusted to the new proportions of excitation and inhibition. The

activations of A- and E-nodes were lower in the CALM Maps than in CALM modules, so

that there would still be enough noise to break symmetry but ordered representations would

not be disturbed. To allow for a smooth distribution of representations, furthermore, the

learning rate was reduced. Finally, two old parameters (specifying the inhibition from Veto

https://www.researchgate.net/publication/243645689_Learning_and_Categorization_in_Modular_Neural_Networks?el=1_x_8&enrichId=rgreq-58e8459b-76d7-4ca9-807d-4c2061dd9f1d&enrichSource=Y292ZXJQYWdlOzIyMDIzMzU4ODtBUzo5NzA5NjQ3MjQ2NTQwOUAxNDAwMTYxMDExMjc4



10

nodes to Representation nodes) were replaced by two new parameters for the inhibition

gradient, which had the following form:

€

hij = Ae−

(i − j )2

2σ 2

− B (5)

where hij denotes the inhibitory weight from the j-th V-node to the i-th R-node, A > 0, B

> 0, and σ determines the width (standard deviation) of the inhibition gradient. Note that B >

A, so that all hij values are negative with maximum value A-B (the inhibition to the matched R-

node in CALM) and the minimal values approach B (the cross-weight in CALM). The value

of σ was kept dependent on module size, n, according to the following empirical formula:

€

σ =n

n −1(6)

The values of σ are rounded to the nearest half. Smaller values than those prescribed by

this formula tend to induce frequent twists. The parameters of the Gaussian, furthermore, do

not change during presentation of a pattern set, nor does any other parameter.

weights Description valueup from R-node to matched V-node 0.5A Gaussian inhibition factor 8.8B Gaussian inhibition constant 10.0flat interconnects V-nodes -1.0high connects V-node to A-node -0.7low connects R-node to A-node 0.3AE Connects A-node to E-node 1.0strange connects E-node to R-nodes 0.25 *inter initial value of learning intermodular weights 0.5k decay of activation 0.25L learning competition factor 1.0**K maximum learning weight value 1.0d base rate of learning 0.0001 ***wµE virtual weight from E-node to learning rate 0.0005

* In the conditioning simulations this parameter was set to 0.1.** In the conditioning simulations this parameter was set to 2.0.*** In the conditioning simulations this parameter was set to 0.005.

TABLE 1. Fixed weight values and parameters in CALM Map (see also Murre et al., 1992).

To illustrate the ordering process in the CALM Map, nine patterns (the same set was

used by Murre, 1992), were presented 100 times for 25 iterations each to a module of size 11

(ring topology) from a number of clamped input nodes. Patterns were presented in a fully

randomised order without replacement. Between presentations all activations, but not the




11

connection weights, were initialised (to zero). In Figure 2 the pattern set and the

categorisation results are shown. After about 30 presentations the patterns are properly

ordered on a one-dimensional scale. Due to the absence of twists and multi-committed nodes,

this represents an improvement on the results of Murre (1992).

13

58

1012

1420

2224

3075

presentations

012345678

91011

nodes

p1: 111100000000

p2: 011110000000

p3: 001111000000

p4: 000111100000

p5: 000011110000

p6: 000001111000

p7: 000000111100

p8: 000000011110

p9: 000000001111

Figure 2. Trace of categorization of pattern set (pl...p9) in a CALM Map of size 11.

As a consequence of the equal starting weights, the activity bubble initially extends over

the full ring for every pattern. For the first presentations during the ordering process, all

representations generally lie close to a randomly selected central node. The activity bubbles

subsequently narrow and split up for the different patterns. The competitive learning rule

reduces the connection weights from inactive connections to nodes within the activity bubble.

Patterns that are disjunct with the pattern causing the bubble, therefore, tend to be

represented outside the activity bubble. Because the weights to nodes farthest away from the

(shrinking) bubble are reduced the least, these nodes will represent patterns that have the least

overlap with the pattern responsible for the activity bubble. The competitive learning

mechanism, in combination with the activation gradient, leads to the 'stretching' property of

CALM Maps (i.e. dissimilar patterns will be represented as far apart as possible

The random activations ensure the selection of a 'central' node and help to break

symmetry when many patterns are represented close together. Because differences in

activations are smallest between neighbouring nodes, the random activations will favor the

separation of representations. In the beginning, the random activations will help the spreading


12

out of representations, but later on they may disturb the ordering. The amplitude of the

random activations, however, depends on the E-node activation, which in turn depends on the

size of the activity bubble. The internal regulation of the activity bubble thus also reduces the

random activations when they are no longer useful for the organisation process. Eventually,

the representations will settle in an organised state with minimal competition and low E-node

activation, so that ordering is not disrupted by the random activations.

In the Kohonen map only the initial weights are chosen at random and there is no

further (state-dependent) random process at work during processing. Consequently, the

representations start out at random positions without apparent ordering and then form small

clusters from which representations are gradually reordered according to their mutual

relationships. It may be noted that the start from a core representation in CALM Maps (and

CALSOM), as compared to many random representations may lead to a considerable

shortening in separation times relative to the Kohonen map. In a comparison of CALSOM

and Kohonen maps in the classification of wave spectrogram data from the ESA GEOS

satellite mission the former indeed appeared to arrive at a slightly better categorisation in an

appreciably shorter time than the latter (Brückner and Gough, Submitted). The reordering of

an initial random order is thus omitted in CALM Maps (and CALSOM), which may be an

advantage for practical applications of topological self-organisation.

4. Single module simulations

We performed a series of simulations to investigate the stretching process in CALM

Maps and compared its categorization behavior to the standard CALM module with respect

to the size of the module, the overlap, and the Euclidean distance between patterns. In all

simulations in this section the patterns were presented 800 times in random order (without

replacement) to both modules. Results were averaged over five replications. The CALM Map

generally converged upon a single node in less iterations than CALM. In all simulations the

maximum number of iterations per presentation was, therefore, kept constant at 25 iterations

for the CALM Map and at 50 iterations for the standard CALM module.


13

4.1. Module size

To study the influence of module size on categorization, patterns similar to the

simulation in Section 3 (four input activations were set to 1.0 and the rest to zero), were

presented to modules of size (n) 7, 12, 17, 22, and 27 (with σ 3.0, 3.5, 4.5, 5.0, and 5.5) from

n+2 input nodes. In each pattern set the number of patterns was equal to the number of nodes

(n) minus two. Categorization results are shown in Figure 3. The number of multi-committed

nodes, that is the number of nodes with more than one representation, slightly increased in

larger modules for CALM Map, but increased strongly for CALM. CALM Map is thus

better suited than CALM for separating highly correlated pattern sets, particularly with larger

modules and pattern sets. Moreover, when module size was increased while leaving the

number of patterns constant, CALM Map achieved up to 100% correct classification, which

was not observed in CALM.

0.0

2.0

4.0

6.0

8.0

10.0

12.0

14.0

16.0

18.0

20.0

0 5 10 15 20 25 30

module size

mu

lti-

co

mm

itte

d n

od

es

CALMCALM Map

Figure 3. Number of multi-committed nodes (number of nodes that have representations for multiplepatterns) as a function of module size for CALM and CALM Map.

Further simulations unexpectedly revealed that, if the number of nodes equalled twice

the number of patterns, representations are spread over the entire range of R-nodes such that

committed and uncommitted nodes alternate. Closer inspection of the weight values,

moreover, revealed that uncommitted nodes actually interpolated between the representations

of the neighbouring nodes. Due to the absence of topological stretching, CALM, of course,

could not show such interpolation. Similar interpolation behaviour was already observed in a

modified, continuous, Kohonen map (the Parameterized Self-Organising Map; Ritter, 1993).

According to Ritter (1993), this kind of behaviour has the advantage of ‘learning from very


14

few examples’ (p. 573). This may be useful in solving what has been called the 'curse of

dimensionality', ‘which is that when the inputs have as many dimensions as natural stimuli

then it is impossible in any realistic time scale to give examples that densely cover the whole

input space, with the consequence that there will be large regions of input space in which the

net has no experience to guide it;’ (p.32, Phillips, 1997). Interpolation also opens the

opportunity for determining the similarity of new patterns to already represented patterns. It

can, thus, be used for a kind of similarity scaling.

Interestingly, an additional benefit of this stretching behaviour and the resulting

interpolation characteristic may be that it also solves the node under-utilisation problem

which troubles many competitive learning procedures (Ahalt, Krishnamurthy, Chen, &

Melton, 1990). In some of these procedures, due to the random initialisation of weights, a

number of nodes may not be used at all to represent a pattern. Here, weights to nodes

neighbouring the winning nodes will also change in the direction of the input pattern. The

stretching property results in uncommitted nodes that will combine weight changes of both

neighbouring representations. So, each node will eventually come to represent a pattern, even

when, as is the case with interpolated representations, the pattern has not actually been

presented.

4.2. Pattern overlap and distance

To investigate the role of overlap we varied the number of shared activations in the data

set. Pattern sets, each containing 11 patterns, were constructed with overlap 3, 4, 5, 6, and 7

(see the pattern set of Figure 2 where the overlap of activated nodes between adjacent

patterns was 3). The size of the module was 13 and the input activations were 0.5. Figure 4

shows that the amount of overlap (expressed in direction cosines) affected performance in

CALM Map and CALM in a comparable way, although the results for CALM Map were

slightly better. As can be deduced from Figure 3a, this advantage is expected to grow,

however, as a function of module size.


15

0.0

0.5

1.0

1.5

2.0

2.5

3.0

3.5

4.0

4.5

0.70 0.72 0.74 0.76 0.78 0.80 0.82 0.84 0.86 0.88 0.90

overlap (in direction cosines)

mu

lti-

co

mm

itte

d n

od

es

CALM MapCALM

Figure 4. Number of multi-committed nodes as a function of overlap (cos φ ) between patterns forCALM and CALM Map.

In a third simulation, a pattern set with constant overlap (3 activations), but a variable

Euclidean distance between patterns (i.e. the size of the activations), was tested. The

activations in the input ranged from 0.10 to 0.50 in steps of 0.05. The number of patterns

was 11 and the size of the module was 13. Figure 5 shows that varying Euclidean distance has

different effects in CALM and CALM Map. Categorisation performance by CALM was

better at smaller distances, but did not improve much as distance increased. Categorisation by

CALM Maps (25 iterations per presentation) improved with increasing number of

presentations and distances, whereas CALM (50 iterations per presentation) appears to

commit itself after about 200 presentations to a once obtained categorisation. It can, therefore,

be useful in CALM Maps to increase the number of presentations to continue separation.

Though it appears that for a distance below 0.4 separation does not improve as a function of

number of presentations, this has been investigated in an additional simulation of 4000

presentations using patterns with 0.10 activation. It was found that not only the number of

multi-committed nodes had decreased further but also that topological organisation had

improved. For small activations it may still be useful to increase the number of presentations,

but the actual number required may be larger than for high activations.


16

0.0

0.5

1.0

1.5

2.0

2.5

3.0

3.5

4.0

4.5

5.0

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8

euclidean distance

mu

lti-

co

mm

itte

d n

od

es

100

200

300

400

500

600

700

800

Figure 5a

0.0

0.5

1.0

1.5

2.0

2.5

3.0

3.5

4.0

4.5

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8

euclidean distance

mu

lti-

co

mm

itte

d n

od

es

100

200

300

400

500

600

700

800

Figure 5bFigure 5.(a) Separation as a function of distance between patterns and the number of presentations for CALM.(d) Separation as a function of distance between patterns and the number of presentations for CALMMap.

4.3. Retroactive interference

One of the constraints imposing strong limitations on the psychological plausibility of

many connectionist learning procedures is the extreme loss of memory for old representations

upon presentation of a new pattern set in a sequential learning task (Grossberg, l987; Ratcliff,

1990; Murre, 1992;). This catastrophic retroactive interference seems to be due mainly to the

overlap in representations of the sequentially presented pattern sets. Presentation of a new

set modifies the existing representations, so that representations for the old patterns become


17

similar to those of the new set. Particularly back-propagation studies, have shown that, after

presentation of the second set (the first set was not presented during the second phase),

memory for the first set is lost and has been mixed with the memory for the second set. In a

competitive learning procedure, which attempts to separate representations, interference

problems should be smaller than with back-propagation, where there is strong overlap due to

the distributed nature of the representations.

0

2

4

6

8

10

12

14

16

18

20

0 20 40 60 80 100

presentations

no

des

p1: 111100000000000000000

p3: 001111000000000000000

p5: 000011110000000000000

p7: 000000111100000000000

p9: 000000001111000000000

p11: 000000000011110000000

p13: 000000000000111100000

p15: 000000000000001111000

p17: 000000000000000011110

Figure 6a

0

2

4

6

8

10

12

14

16

18

20

0 20 40 60 80 100 120 140

presentations

no

de

s

p2: 011110000000000000000

p4: 000111100000000000000

p6: 000001111000000000000

p8: 000000011110000000000

p10: 000000000111100000000

p12: 000000000001111000000

p14: 000000000000011110000

p16: 000000000000000111100

p18: 000000000000000001111

Series10

Series11

Series12

Series13

Series14

Series15

Series16

Series17

Series18

Figure 6bFigure 6. Retroactive interference in CALM Map with overlapping pattern sets.(a) The initial categorization trace of learning set A.(b) Categorization trace of learning set B and results of subsequent testing of both set A and B.

In CALM Maps no strong retroactive interference is expected, because either the

representations of the new pattern set are completely distinct from the old ones, or the new


18

representations can be assimilated in the ordering of the old patterns and the existing order

will be preserved. To investigate the latter case, we presented two pattern sets of which the

second was an interpolated set of the first. This may represent one of the strongest cases of

overlap between the two sequentially presented sets. The module size was 18 and the number

of presentations was 100 for each set. Figure 6a shows that after learning the first set, the

representations were separated such that committed and uncommitted nodes alternated.

Inspection of the weight values after the last presentation of the first set revealed that

uncommitted nodes actually interpolated between the representations of the neighbouring

nodes. Patterns of the second set were immediately committed to nodes with the interpolated

representations. The tests at the right side of Figure 6b clearly indicate the absence of

catastrophic retroactive interference in this simulation. The stretching property thus reduces

retroactive interference when the pattern sets are highly correlated by separating

representations as far apart as possible.

Retroactive interference was found in standard CALM modules, but differed from

retroactive interference in back-propagation. Training the second set in CALM often had the

effect that representations from the second set replaced those from the first. A further test,

however, showed that the representations of the first set had not mixed with the

representations of the second, but changed place and now occupied previously uncommitted

nodes. After learning the first set, representations were presumably still rather broad, so that

patterns of the second set could also be accommodated on these nodes. By learning the

second set, the representations were recoded and more finely tuned, so that the old patterns

did not fit in anymore. The unsupervised character of CALM, thus, ensured that the

representations of the two pattern sets did not remain mixed, but that, despite some

interference, the patterns kept separate representations.

When, in contrast to the previous simulation, two fully disjunct sets of patterns are

presented to a CALM Map, patterns from the second set would be expected to replace

existing representations. Additional simulations with sequential presentation of pattern sets

(p1...p7) and (p11...pl8), however, surprisingly revealed that, even if the number of nodes

was much larger (we tried three and four times) than in the previous simulation, patterns from

the second set were clustered on only a few nodes, whereas representations for the first set

remained spread out maximally. Retroactive interference was thus replaced by proactive

interference. Under more ecologically valid conditions stimuli will vary on many attribute

dimensions (e.g. Phaf, Van der Heijden, & Hudson, 1990) in which pattern sets may not be


19

fully distinct, so that in a multimodular network even these sets may obtain separate

representations.

5. Multimodular Networks

A CALM Map is not intended for use as the full network model. A multimodular

network architecture has larger information processing abilities than a single module network

due to the presence of independent parallel processing pathways. Moreover, such

architectures allow for hierarchical systems that perform categorization of stimuli at different

levels of abstraction. Happel and Murre (1994) have derived general design principles of

multi-modular networks by exposing simulated neural networks, of which the structure was

generated by a genetic algorithm, to selection pressures, also applying to actual neural

systems. With this evolution-inspired optimization procedure Happel and Murre (1994) in

fact obtained architectural features that were very similar to features of the visual system.

According to their principle of structural compatibility, the best categorization is obtained

when the (evolutionary prepared) modular structure corresponds to the cluster structure of

the task domain. When, for instance, the input consists of hierarchically clustered patterns

containing smaller subclusters, a coarse categorization in a small module can interactively

facilitate a (simultaneous) more fine-grained categorization in a larger module. The second

principle argues that multiple, parallel, pathways may improve categorization compared to a

single pathway, because the pathway with the best organization will be the fastest to

converge on a suitable representation and so will come to dominate the total organization. The

last principle, the principle of recurrence, maintains that the presence of recurrent connections

between modules may also enhance categorization compared to the situation where no such

connections are present. Both bi-directionally connected modules may interactively benefit

from the gradually increasing differentiation in either module. It should be noted, however,

that these principles were derived with CALM modules and that it was not sure whether

they would also be valid for CALM Maps.

Due to the distribution of (local) representations over multiple modules,

representations may get a more continuous form, taking on different category boundaries in

different modules, and so these networks have a potentially greater discriminatory power

than a single CALM Map module. A further practical advantage of modular networks is that

the scale of the simulations can be enlarged by increasing the number of modules without


20

changing the size of the constituent modules. In networks that have no restrictions on

connectivity between nodes, the practical costs of increasing scale may become prohibitive

much sooner than in modular networks.

Because CALM Map is best used as a building block for larger networks, more so than

the Kohonen maps, several multimodule network simulations of experimental data have been

performed. In one of these, Tijsseling (1998) modeled categorical perception (Harnad, 1987)

by training a fully recurrent multimodular network consisting of seven modules to

discriminate and categorise Gabor filtered (Gabor, 1946) drawings of lines with varying

orientations. The results of these simulations were later supported by human experimental

data (Pevtzow, Tijsseling, & Harnad, Submitted). In another study (Phaf & Van Immerzeel,

1997), dissociation effects between explicit and implicit human memory performance

(Schacter, 1987; see also Section 2) were simulated in a network implementing the

activation/elaboration account discussed earlier. A third set of simulations of fear conditioning

in a dual-pathway network model will be treated in more detail here. Recently, moreover, the

dual-pathway model was extended to also simulate evaluative conditioning and mere exposure

effects (Grob, 2001).

5.1. A connectionist dual pathway model of fear conditioning

A published connectionist model in which the need for competitive modules forming

topological, or in this case tonotopic, maps is apparent, is the network model by Armony,

Servan-Schreiber, Cohen, and LeDoux (1995), which was inspired by the neurobiological

model of LeDoux (1986, 1996). Both the network and the neurobiological model explicitly

assume a type of multimodular architecture that seems to adhere to the first principle of

Happel and Murre (1994). LeDoux primarily investigated emotions and affective processes

through neurobiological research on animal fear conditioning. In these conditioning

experiments an initially neutral conditioned stimulus (CS) (e.g. a tone of a particular

frequency) is paired with a fear-evoking unconditioned stimulus (US) (e.g. an electric shock).

As a consequence, presentation of the CS without the US also evokes a fear response. The

intensity of the fear response decreases and eventually disappears with repeated presentation

of the CS without the US (i.e. extinction).


21

LeDoux investigated which pathways and modules were involved in conditioning and

extinction in a series of experiments in which he lesioned a specific brain area of the animal

and tested how this affected conditioning. He found that even when the auditory cortex was

completely ablated, rats could still be conditioned to auditory stimuli (LeDoux, Sakaguchi, &

Reis, 1984). Lesions to the thalamus and the midbrain, however, totally prevented

conditioning. Subsequent tracing techniques revealed a neural pathway leading from the

thalamus to the amygdala. This pathway appeared to be sufficient for conditioning but could

not discriminate between very similar stimuli or between stimuli in different contexts.

Experiments with lesions to this direct pathway but with the cortex left intact, however,

showed that the direct pathway was not necessary for conditioning and that probably also a

parallel, indirect pathway has to be distinguished. The indirect pathway via the cortex is held

responsible for finer discrimination and more extensive processing than the direct pathway.

The dual pathway model of LeDoux (1986, 1996) adheres to the first principle of

Happel and Murre (1994), which was arrived at through evolutionary computational

methods. Because both coarse and fine-grained categories can be distinguished in conditioned

stimuli, fear processing of these stimuli is facilitated by this dual pathway architecture. The

architecture may reflect the biological preparation for activating gross affective categories,

such as fear. The indirect pathway would then provide a finer specification of the affective

processing, at the same time ensuring that the gross affective categorizations are preserved

during further refinement. An example of this specification may be found in context effects on

conditioning. Conditioning, moreover, extinguishes after repeated presentation of the CS

without the US, which seems to be mainly caused by the indirect pathway and not by the

direct pathway. After lesioning the indirect pathway even many presentations of the CS

without the US seem to have no effect on the conditioned response (LeDoux, 1996). Also,

from behavioral experiments it is known that after extinction some trace of the conditioned

stimulus is preserved. This is demonstrated in experiments where the conditioned fear

response returns after extinction. For example, a fear response can be reinstated after

presentation of the US, or it can return spontaneously after a period during which neither CS

nor US is presented (Bouton & Swartzentruber, 1991). Apparently, extinction functions by

the active inhibition via the indirect pathway, of a CS-US association still present in the direct

pathway.

The Armony et al. (1995) connectionist model of the dual pathway architecture had one

input module of 16 nodes, through which the CS input was provided. The model further


22

consisted of four modules: the amygdala (3 nodes), the cortex (8 nodes) and two sub-

structures within the thalamus: MGm/PIN (8 nodes) and MGv (3 nodes). US input was

provided to the network by directly activating the amygdala and MGm/PIN. The modules in

the network were connected in such a way that CS input could be transmitted along a direct

and an indirect pathway to the amygdala. The CS input module was connected to both MGv

and MGm/PIN. Only MGm/PIN was connected to the amygdala, forming the direct

pathway. The indirect pathway was formed by connections from both MGm/PIN and MGv

to the cortex, and from the cortex to the amygdala. All connections between modules were

unidirectional and all-to-all.

The learning algorithm of the Armony et al. (1995) model was a modification of the

competitive learning algorithm by Rumelhart and Zipser (1985). The most important changes

concerned the inclusion of continuous instead of discrete values for the activations, the

implementation of competition by actual lateral inhibition between nodes in a module (instead

of the direct selection of a winning node), and the adjustment of the learning rules to the

continuous activation values. In these simulations a series of pure tones of contiguous

frequencies (and equal intensities) in an arbitrary scale served as input patterns to MGv and

MGm/PIN. The US was represented as an external binary input to all nodes of MGm/PIN

and amygdala modules, so that an equal amount of activation would be sent to all nodes in

these modules. After the familiarization phase, the specificity of the cell responses was

established by presenting all input patterns (without US) and recording the resulting

activation in each node for each input pattern. Coupling the US to a single pure tone caused

changes in receptive fields in the corresponding modules of the model, which were similar to

those observed experimentally in animals. The total activation of all nodes in the amygdala,

moreover, showed a clear increase for the selected tone, indicating successful conditioning.

5.2. Conditioning simulations

The CALM Map seems suited for rebuilding the Armony et al. model (see Figure 7)

and for replicating their simulations (see also Den Dulk, Rokers, & Phaf, 2000), because many

features that had to be added to the Rumelhart and Zipser (1985) algorithm are already

present in CALM Maps. In the Armony et al. (1995) simulation, however, neighboring

frequencies were not represented systematically on neighboring nodes in the module, because

their competitive learning scheme did not allow for such ordering. A tonotopical ordering (i.e.


23

a topological ordering of tones), which can be found in the auditory pathways of many

animals, emerges automatically from the learning procedure in CALM Maps. The CALM

Map also provides a more suitable way of administering the unconditioned stimulus than the

Rumelhart and Zipser (1985) type of competitive learning, because of the novelty detection

mechanism in CALM. Novelty is often associated with fear responses, so the novelty

detection in CALM can be seen as one way of evoking fear responses. Direct activation of the

Arousal-node can, therefore, be considered a fear response. We capitalized on this by feeding

the US-input (with activation 1.0) to the A-nodes of the two modules (MGm/PIN and

Amygdala), which also received the US in the Armony et al. (1995) model.

Figure 7: The architecture of the dual pathway model with CALM Maps. Input is given through thebottom input module. All other rectangles represent CALM Maps, with Arousal-nodes and External-nodes indicated externally of the module. Activity bubbles around a winning node are depicted.

Parameters in the model were kept equal to the previous simulation with a few

exceptions (see Table 1). After a set of parameters was found which produces tonotopical

organisation, no effort was spent to adjust them to obtain optimal model behaviour. The

Gaussian σ of the ring CALM Maps, which was established by applying Equation 5, was 3.0

for MGv, 3.0 for Cortex, 2.0 for MGm/PIN, 2.0 for Amygdala. In the first phase of the

simple conditioning experiment, all patterns were familiarised. The input patterns (potential

CSs) represented 15 contiguous frequencies. Each frequency-pattern consisted of two

neighbouring nodes, and had an overlap of one active node with one pattern to either side. The

right- and left-most frequencies only had overlap to one side.


24

0

1

2

3

4

5

6

7

8

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Frequency

No

de

nu

mb

er in

MG

V

.

Figure 8. Tonotopical organization in MGv. Because the CALM Map has a ‘ring’ structure thestarting node in the map is arbitrary.

In the familiarisation phase all 15 patterns were presented 150 times for 20 iterations (a

cycle of calculating all activations and weights) each. In the conditioning phase Frequency 5

was coupled to the US, thus making this pattern the CS. The US-CS pair was fed to the

network for 10 presentations. The network (after conditioning) was tested five times, and

average values were used as a measure of performance. To examine the effects of conditioning

we compared the receptive fields of individual nodes before and after conditioning. Because

amygdala activity is assumed to result (via the hypothalamus, e.g. LeDoux, 1996) in various

autonomic and endocrine reactions, total amygdala activity can be seen as a measure of

autonomic activity. The summed activation in the amygdala was, therefore, also registered as

a function of the frequencies presented both before and after conditioning.


25

0

0.1

0.2

0.3

0.4

0.5

0.6

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Frequency

Act

ivat

ion

pre-conditioning

post-conditioning

Figure 9. Pre- and post-conditioning receptive fields in MGm/PIN of the node most responsive to theCS before conditioning.

The CALM Maps produced a good tonotopical ordering in all modules (e.g. see Figure

8). Because all modules had fewer nodes than there were input patterns, a receptive field

generally contained more than one pattern. As a consequence of conditioning the receptive

field of the relevant MGm/PIN node (Figure 9) sharpened and shifted towards the frequency

of the conditioned stimulus (Frequency 5). The receptive fields of the cortex and the

amygdala modules showed similar shifts. The frequency-specific changes occurred only for

nodes in which the CS evoked a non-zero response before conditioning. The three modules

also showed a substantial increase in their response to the CS. There was no observable effect

of conditioning in MGv, because the US does not activate it, either directly or indirectly. The

converging activations of US and CS, could only arrive at the cortex through MGm/PIN. The

change in the receptive field of MGm/PIN, therefore, led to a change in the receptive field of

the cortex. The amygdala received US activation in three ways, direct activation from the US

to its Arousal-node, indirect activation from MGm/PIN to the amygdala, and indirect

activation via the cortex to the amygdala. The architecture of this model is such that the effect

of conditioning converges on the amygdala. After conditioning, the CS produced a higher total

amygdala activation than the other input frequencies. In addition, as the distance of the

frequency from the CS increased, the quasi-autonomic response decreased and was smaller

than before conditioning (Figure 10). These results were similar to the simulation results

found by Armony et al. (1995) and to experimental results obtained with fear conditioning of

animals.


26

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0.18

0.2

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Frequency

Tot

al a

myg

dala

act

ivat

ion

pre-conditioning

post-conditioning

Figure 10. Summed activation of all three nodes in the Amygdala as function of input frequency.

The simulations of Armony et al. (1995) were also extended to simulations of latent

inhibition and of extinction. Latent inhibition can be described as the lower susceptibility to

conditioning of a familiar stimulus than of an unfamiliar (i.e. novel) one. The model with

CALM Maps may provide an excellent opportunity for simulating this, because both novelty

and the US lead to fear responses. With relatively novel stimuli, the fear response during

conditioning would be larger than with familiar stimuli. To reflect this difference in experience,

a new familiarisation phase was run in which one half of the frequencies were presented 100

times and the other half 200 times. Stimuli from both groups were alternated. In one condition

we chose a CS which had previously been presented 100 times (Frequency 4) and in the other

condition a CS which had been presented 200 times (Frequency 5).

The total amygdala activation brought about by presenting the conditioned low-familiar

(LF) stimulus were larger (Figure 11a) than when presenting the conditioned high-familiar

(HF) stimulus (Figure 11b). It should be noted that latent inhibition was found despite the

fact that LF stimuli were represented less strongly in the network than HF stimuli. Without

such novelty-dependent elaboration learning, conditioning would probably be smaller for the

LF than for the HF stimulus. A similar novelty detection mechanism is not present in the

Armony et al. (1995) model, which does not provide the opportunity to simulate latent

inhibition of conditioning.


27

0.00

0.02

0.04

0.06

0.08

0.10

0.12

0.14

0.16

0.18

0.20

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Frequency

Tot

al a

myg

dala

act

ivat

ion

pre-conditioning

post-conditioning

Figure 11a. Total amygdala activation as a function of frequency in the latent inhibition simulation.Frequency 4 was conditioned, after it was familiarized in 100 presentations.

0.00

0.02

0.04

0.06

0.08

0.10

0.12

0.14

0.16

0.18

0.20

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Frequency

Tot

al a

myg

dala

act

ivat

ion

pre-conditioning

post-conditioning

Figure 11b. Total amygdala activation as a function of frequency in the latent inhibition simulation.Frequency 5 was conditioned, after it was familiarized in 200 presentations.

A further extension to the Armony et al. (1995) model consisted of an attempt to

simulate extinction of the conditioned response by assuming that it was caused by

interference due to learning of intervening material (e.g. noises in the environment). For this

purpose we presented all other frequencies together with the CS, but without an US, during

extinction. The other frequencies were presented twice as often as the CS to ensure sufficient

interference. The network state that resulted from conditioning was exposed 15 times (for 20

iterations) to a randomised pattern batch consisting of the CS and two instances of all other


28

frequencies. For comparison, we performed the same extinction procedure on the network

that resulted only from the familiarisation phase (i.e. before conditioning).

0.00

0.02

0.04

0.06

0.08

0.10

0.12

0.14

0.16

0.18

0.20

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Presentation

Tot

al a

myg

dala

act

ivat

ion

conditioned stimulus

non-conditioned stimulus

Figure 12: Total Amygdala activation by the CS, both when it had and had not been conditioned.

The conditioned frequency showed a decrease in fear response over repeated

presentations (Figure 12), whereas the control stimulus lacked such a decrease. In fact, it

increased slightly. After 15 presentations the activation levels were about equal. Though

extinction can be interpreted as caused by interference, this is not suggested by animal

research. In behavioural experiments on rats a relapse of the fear response can occur in a

number of situations (Bouton & Swartzentruber, 1991). In our model such a relapse is not

possible because the conditioning information was lost permanently by interference. An

adequate simulation of extinction in a network model probably also requires an active top-

down control process, which can be eliminated by lesioning the indirect pathway.

5.3. Lesioning the pathways

To investigate the contributions of the individual pathways to conditioning, we lesioned

the direct and indirect pathways of the network both before and after conditioning. The

lesions to the indirect path were applied by disabling the connections from the cortex module

to the amygdala module. In the direct path the connections between MGm/PIN and amygdala

were disabled. Apart from the lesions, all features of this simulation were identical to the first

simulation. Our model could be conditioned without the cortical pathway (see Figure 13a).


29

The conditioning effect had about the same size as the effect found with both pathways

intact.

0.00

0.02

0.04

0.06

0.08

0.10

0.12

0.14

0.16

0.18

0.20

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Frequency

Tot

al a

myg

dala

act

ivat

ion

pre-conditioning

post-conditioning

Figure 13a: Total pre- and post-conditioning amygdala activation as function of frequency, afterlesioning the indirect pathway before conditioning.

0.00

0.02

0.04

0.06

0.08

0.10

0.12

0.14

0.16

0.18

0.20

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Frequency

Tot

al a

myg

dala

act

ivat

ion

pre-conditioning

post-conditioning

Figure 13b: Total pre- and post-conditioning activation as a function of frequency, after lesioning thedirect pathway before conditioning.

Lesioning the direct pathway before conditioning resulted in a large decrease in

autonomic response, though a slight conditioning effect remained (see Figure 13b). This seems

at odds with the experimental finding that, without the direct path, conditioning is still

possible. Because there is an additional layer in the indirect pathway, activations transported

to the amygdala through this pathway are attenuated. Conditioning effects may have been


30

smaller as a result of this attenuation. Contrary to experimental findings, the contribution of

the indirect pathway to conditioning in the network model seems much smaller than the

contribution of the direct pathway.

0.00

0.02

0.04

0.06

0.08

0.10

0.12

0.14

0.16

0.18

0.20

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Frequency

Tot

al a

myg

dala

act

ivat

ion

pre-conditioningpost, direct lesion

post, indirect lesion

Figure 14: Total pre- and post-conditioning amygdala activation as a function of frequency, afterlesioning the indirect pathway or the direct pathway after conditioning.

Lesioning the indirect path after conditioning revealed that almost the entire

conditioning effect remained, whereas it almost completely disappeared after lesioning the

direct path (see Figure 14). This again strengthens the idea that the direct pathway in the

model is more important for conditioning than the indirect pathway. The Armony et al.

(1995) model seems to show similar, unevenly balanced, conditioning effects along the two

pathways (see also Armony, Servan-Schreiber, Cohen, and LeDoux, 1997), but this does not

seem to conform to the underlying biological model.

5.4. Discussion

In sum, the actual competitive learning procedure used in the model does not seem

critical for obtaining these conditioning results. They seem to arise primarily from the

neurobiologically inspired network architecture. An advantage of CALM Maps may be that it

is equipped with a novelty detection mechanism of which the arousal node may be a suitable

site for applying the US. In the standard CALM Maps novelty automatically leads to

activation of the Arousal-node which can be seen as a fear response. The assumption by

Armony et al. (1995) that the US directly activates all nodes in the amygdala and the


31

MGm/PIN can be avoided in this way. The combined operation of novelty and US, moreover,

leads to a latent inhibition effect, which probably does not occur in the Armony et al. (1995)

model. The replacement of competitive learning in the Armony et al. (1995) model by CALM

Maps showed a number of other advantages. Due to the gradient inhibition, a tonotopic

ordering of input frequencies arose automatically in all modules. Another advantage may be

that the network can easily be extended to include recurrent connections between modules.

From these simulations some suggestions for improvements can be made, which may

lead to a better correspondence of the computational model to the conceptual model. A clear

problem is that the contribution to conditioning of the indirect pathway seems much smaller

than of the direct pathway. The additional layer between input and amygdala in the indirect

pathway attenuates its activation. This renders it difficult to judge the conditioning effects

along the indirect pathway. If the cortex activation would be larger, the indirect pathway

might well show a finer discrimination of stimuli than the direct pathway. In subsequent

simulations (Pallamin, Den Dulk, & Phaf, Unpublished results), we multiplied the weights

from MGv to the cortex by two, resulting in equally strong conditioning effects in both

pathways. Another way to obtain larger activations in the cortex may be to incorporate

bidirectional connections between thalamus and cortex, so that positive feedback enhances the

representations in the cortex. The theoretical model (e.g. LeDoux, 1996), at least, assumes

bidirectional (but not symmetric) connections between amygdala and cortex.

Although the processes in the two pathways of our model differ in speed (the indirect

pathway is slower and weaker), and capacity (the indirect pathway has more nodes), there is

no further distinction between the explicit functions of either pathway. LeDoux (1986, 1996),

however, has suggested that the direct pathway may be more important for fast reactions in

unexpected dangerous situations, whereas the indirect path may be involved in higher order

behaviour, such as control processes. A better way to model extinction of conditioning would

probably be to incorporate such control processes. Extinction appears to arise from the

learning of regulatory control in the indirect pathway, but not from interference per se.

Appropriate modeling of the function of the indirect pathway would require implementation

of regulatory control. Though there are few ideas on how to implement (sequential) control

processes in a connectionist framework, Sequentially Recurrent Networks (SRNs), which also

show a working memory function (see Phaf, Mul, & Wolters, 1994; Phaf, & Wolters, 1997),

may provide an opportunity for learning and executing sequential operations on items

activated in working memory. The disruption of (the influence of) these control processes


32

after extinction, for instance by lesioning the indirect pathway, could then restore

conditioning.

In a fuller implementation of the dual pathway model it may also be possible to

simulate experimental results with human subjects in the field of emotion (e.g. LeDoux, 1996).

Affective priming, for instance, is the remarkable phenomenon that affective stimuli, such as

faces with emotional expressions (both positive and negative) may have a larger influence on

the evaluation of neutral stimuli (e.g. Chinese ideographs) when the affective prime is not

consciously perceived than when it is (Murphy & Zajonc, 1993). There are even indications

that the priming effect reverses in conscious conditions. This is at odds with the more often

found pattern of non-affective priming where conscious influences are, generally, larger than

non-conscious ones. Such results may emphasise the importance of emotions (and emotion

research) for human behaviour, because it indicates that the human organism has been

evolutionary prepared to perform direct emotional reactions, on top of which a good deal of

regulatory control has developed.

6. Conclusion

CALM Maps show some practical improvements above both the classic version of

Kohonen's self-organizing feature map and the standard CALM module. In contrast to

Kohonen's map (Kohonen, 1982, 1988), it needs no external regulating mechanisms, but is

capable of automatically adjusting its learning rate on the basis of the degree of novelty

detected, and shrinks its 'activity-bubble' without adjusting the inhibitory weights.

Activations in CALM Map are determined by a weighted sum rule instead of by an Euclidean

distance measure. Kohonen (1993, 1995) has however, also provided alternative

implementations and neurobiological justifications of his self-organizing map. These

adjustments are probably better justified neurobiologically than the practical 'shortcuts'

Kohonen took in his early version of the map.

Presumably due to the difficulty of regulating the different maps and of handling

bidirectional connections, Kohonen maps have not been applied to the kind of interactive,

multimodular, networks (McClelland, 1993) that CALM has been designed for. It is difficult

to see how the Kohonen map could be used in the multimodular network applications

discussed earlier. To obtain a shift in receptive fields due to conditioning with Kohonen maps,


33

for instance, would require changes in the winner-take-all function and the learning rule, which

would make the Kohonen map more similar to the continuous form of competitive learning of

Armony et al. (1995) and to the present Maps. CALM Map can, therefore, also be seen as a

means of making a Kohonen type map fit for incorporation in multimodular networks. A clear

limitation of CALM Map is that it can only deal with unidimensional topologies, whereas the

Kohonen map can make multidimensional orderings. There are, however, indications that

multimodular architectures of unidimensional maps may be used to replace single

multidimensional Kohonen maps. A parallel arrangement of two CALM Maps could, in

principle, cover a two-dimensional input space. It should of course be acknowledged that the

Kohonen Map has proved itself in many useful applications (e.g. Kohonen, 1995) and that

there is much more insight in the computational abilities of the Kohonen Map than of the

CALM Map. Though Kohonen maps and CALM Maps aim at slightly different

applications, a fair comparison of both types of maps can only be made after much additional

research on CALM Map.

In comparison to a standard CALM module, categorisation is improved in CALM

Maps for correlated patterns due to the stretching process which continuously tries to

separate representations as far apart as possible. A nice feature of this process is the

interpolation of intermediary representations on uncommitted nodes. The stretching and

interpolation guarantees that every node will eventually come to represent some pattern even

when this pattern has not actually been presented. It thus solves the underutilisation problem

that troubles many competitive learning procedures (Ahalt et al., 1990). The introduction of

topological ordering in a competitive learning procedure seems to enhance the ability to

organise and distinguish correlated patterns without supervision, particularly with highly

correlated patterns or large modules. With overlapping pattern sets no catastrophic

interference is found in CALM Maps. With distinct pattern sets, however, proactive

interference is found.

These simulations provide a number of functional arguments for CALM Maps, but

there are also good psychological (e.g. McClelland, 1993) and neurobiological (e.g.

Szentágothai, 1975) considerations for assuming some elementary canonical neuronal circuit

with self-organising abilities. Though the CALM Map probably also is too simple a model

for the canonical circuit, there appear to be similarities to the properties of neural hardware.

In contrast to procedures applying global mathematical procedures, which, for instance, use

information that is not available locally (or even at that particular time), the interactions are


34

local here and can in principle take place in the neural system. It should be noted that only the

(state-dependent) variable learning rate violates the locality of interactions. Such a mechanism

could be implemented, however, in the neural system by a neuromodulator which is produced

locally and only affects a limited network region (see also Kohonen, 1993). The replacement

of a winner-take-all rule by actual inhibitory connections (requiring also continuous

activations) forms part of the objective to achieve more neurobiological plausibility. A nice

aspect of the graded inhibition is that it takes care of both the winner-take-all function and the

'activity bubble'. Boundary problems have been solved by assuming a ring topology. Such

topologies may be neurally plausible if it is assumed that the elementary modules (possibly

the columns, e.g. Szentàgothai, 1975) also have a circular shape (e.g. Bonhoeffer and Grinvald,

1991) and that the growth of connections is such that inhibition to nearby cells is weaker than

to more distant cells within the module.

If novelty detection forms a central feature of a canonical learning module, large

distributed brain areas should be involved in it. Scalp and intracranial event related brain

potential recordings, lesion data, and neuroimaging results have indeed implicated prefrontal,

tempero-parietal, and limbic regions in novelty detection (e.g. Knight, 1996, 1997; Knight &

Scabini, 1998). The fear conditioning model of Section 5, in fact, incorporated novelty

detection in all, supposedly cortical and subcortical, modules, except in the input module. It

would almost appear that the present modelling approach would postulate in more brain areas

than where neural research has actually found it. The paradox can be resolved by assuming

that these neurophysiological studies do not record the highly distributed novelty detection

process itself, but only the summed activity in a number of brain areas specialised in the

functional consequences of novelty detection. There are indications that the prefrontal cortex

is the first large region to be activated by novelty and is primarily responsible for the

direction of attention to the novel stimuli (Daffner et al., 2000). Subsequently, also amygdala,

hippocampal, and parahippocampal regions are activated which seem to have a modulatory

role in memory encoding (Halgren & Smith, 1987; Wilson & Rolls, 1993; Tulving & Kroll,

1995; Tulving et al., 1996; Eichenbaum, 1999). The hypothesis of the distributed nature of

novelty detection on the one hand, and the more localised functional consequences of detected

novelty on the other hand, may be an example of how connectionist modelling may help to

impart meaning on, not yet fully understood, neural mechanisms.


35

Author notes:

Correspondence should be addressed to Dr R. Hans Phaf, Psychonomics Department,

University of Amsterdam, Roetersstraat 15, 1018 WB Amsterdam, The Netherlands

([email protected]). We thank Cees Van Leeuwen, Jaap Murre, Henk Tijssen,

and Bas Rokers for their help during various stages of this work. We are also grateful to two

anonymous referees for helpful comments.

References

Armony, J.L., Servan-Schreiber, D., Cohen, J.D., & LeDoux, J.E. (1995). An anatomicallyconstrained neural network model of fear conditioning. Behavioral Neuroscience, 109, 1-12.

Armony, J.L., Servan-Schreiber, D., Cohen, J.D., & LeDoux, J.E. (1997). Computationalmodeling of emotion: explorations through the anatomy and physiology of fearconditioning. Trends in Cognitive Science, 1, 28-34.

Ahalt, S.C., Krishnamurthy, A.K., Chen, P., & Melton, D.E. (1990). Competitive learningalgorithms for vector quantization. Neural Networks, 3, 277-290.

Bonhoeffer, T. & Grinvald, A. (1991). Iso-orientation domains in cat visual cortex arearranged in pinwheel-like patterns. Nature, 353, 429-431.

Bouton, M.E., & Swartzentruber, D. (1991). Sources of relapse after extinction in Pavlovianand instrumental learning. Clinical Psychology Review, 11, 123-140.

Bower, G.H. (1996). Reactivating a reactivation theory of implicit memory. Consciousnessand Cognition, 5, 27-72.

Brückner, J.R. & Gough, M.P. (Submitted). Comparison of CALSOM and SOM neuralnetworks for space data pattern recognition.

Daffner, K.R., Mesulam, M.M., Scinto, L.F.M., Acar, D., Calvo, V., Faust, R., Chabrerie, A.,Kennedy, B., & Holcomb, P. (2000). The central role of the prefrontal cortex in directingattention to novel events. Brain, 123, 927-939.

Den Dulk, P., Rokers, B., & Phaf R. H. (1998). Connectionist simulations with a dual routemodel of fear conditioning. In B. Kokinov (Ed.), Perspectives on Cognitive Science, Vol. 4,(pp. 102-112). Sofia: New Bulgarian University Press.

Eichenbaum, H. (1999). The hippocampus: The shock of the new. Current Biology, 9, R482-R484.

Erwin, E., Obermayer, K., & Schulten, K. (1992). Self-organizing maps: Stationary states,metastability and convergence rate. Biological Cybernetics, 67, 47-55.

Gabor, D. (1946) Theory of communications. Journal of the Institute of ElectricalEngineering. 93, 429-457

mailto:[email protected]


36

Graf, P., & Mandler, G. (1984). Activation makes words more accessible, but not necessarilymore retrievable. Journal of Verbal Learning and Verbal Behavior, 23, 553-568.

Grob, J. (2001). A neural network model for evaluative conditioning and mere exposure.Unpublished Master’s thesis. Maastricht University, the Netherlands.

Grossberg, S. (1973). Contour enhancement, short term memory, and constancies inreverberating neural networks. Studies in Applied Mathematics, 52, 217-257.

Grossberg, S. (1982). Studies of mind and brain: Neural principles of learning, perception,development, cognition, and motor control. Boston, MA: Reidel Press.

Grossberg, S. (1987). Competitive learning: From interactive activation to adaptive resonance.Cognitive Science, 11, 23-63.

Grossberg, S. (1988). Nonlinear neural networks: principles, mechanisms, and architectures.Neural Networks, 1, 17-61.

Halgren, E. & Smith, M.E. (1987). Cognitive evoked potentials as modulatory processes inhuman memory formation and retrieval. Human Neurobiology, 6, 129-139.

Happel, B.L.M. & Murre, J.M.J. (1994). Design and evolution of modular neural networkarchitectures. Neural Networks, 7, 985-1004.

Harnad, S. (Ed.) (1987). Categorical perception: The groundwork of cognition. Cambridge,UK: Cambridge University Press.

Knight, R.T. (1996). Contribution of human hippocampal region to novelty detection.Nature, 383, 256-259.

Knight, R.T. (1997). Distributed cortical network for visual stimulus detection. Journal ofCognitive Neuroscience, 9, 75-91.

Knight, R.T. & Scabini, D.L. (1998). Anatomic bases of event-related potentials and theirrelationship to novelty detection in humans. Journal of Clinical Neurophysiology, 15, 3-13.

Kohonen, T. (1982). Self-organized formation of topologically correct feature maps.Biological Cybernetics, 43, 59-69.

Kohonen, T. (1988). Self-organization and associative memory (2nd ed.). Berlin: SpringerVerlag.

Kohonen, T. (1993). Physiological interpretation of the self-organizing map algorithm. NeuralNetworks, 6, 895-905.

Kohonen, T. (1995). Self-organizing maps. Berlin: Springer Verlag.

LeDoux, J.E. (1986). Sensory systems and emotion: A model of affective processing.Integrative Psychiatry, 4, 237-248.

LeDoux, J.E. (1996). The Emotional Brain. New York: Simon & Schuster.

LeDoux, J.E., Sakaguchi, A., & Reis, D.J. (1984). Subcortical efferent projections of themedial geniculate nucleus mediate emotional responses conditioned by acoustic stimuli.Journal of Neuroscience, 4, 683-698.

Mandler, G. (1980). Recognizing the judgment of previous occurrence. Psychological Review,87, 252-271.

McClelland, J.L. (1993). Toward a theory of information processing in graded, random, andinteractive networks. In D.E. Meyer and S. Kornblum (Eds.), Attention and Performance


37

XIV: Synergies in Experimenal Psychology, Artificial Intelligence, and CognitiveNeuroscience (pp. 655-688). Cambridge, MA: MIT Press.

Miikulainen, R. (1991). Self-organizing process based on lateral inhibition and synapticresource distribution. In T. Kohonen, K. Mäkisara, O. Simula, and J. Kangas (Eds.),Artificial Neural Networks (pp. 415-420). Amsterdam: Elsevier Science Publishers.

Murphy, S.T., & Zajonc, R.B. (1993). Affect, cognition, and awareness: Affective primingwith optimal and suboptimal stimulus exposures. Journal of Personality and SocialPsychology, 64, 723-739.

Murre, J.M.J. (1992). Learning and categorization in modular neural networks. HemelHempstead, U.K.: Harvester Wheatsheaf.

Murre, J.M.J., Phaf, R.H., & Wolters, G. (1992). CALM: Categorizing and learning module.Neural Networks, 5, 55-82.

Page, M. (2000). Connectionist modelling in psychology: A localist manifesto. Behavioral andBrain Sciences, 23, 443-512.

Pevtzow, R., Tijsseling, A., & Harnad, H. (Submitted). Dimensional attention effects inhumans and neural nets.

Phaf, R.H. (1994). Learning in natural and connectionist systems: Experiments and a model.Dordrecht: Kluwer Academic Publishers.

Phaf, R.H., Mul, N.H., & Wolters, G. (1994). A connectionist view on dissociations. In C.Umiltà & M. Moscovitch (Eds.), Attention and performance XV (pp. 725-751).Cambridge, MA: MIT Press.

Phaf, R.H., Van Der Heijden, A.H.C., & Hudson, P.T.W. (1990). SLAM: A connectionistmodel for attention in visual selection tasks. Cognitive Psychology, 22, 273-341.

Phaf, R.H., & Van Immerzeel, M.S.A. (1997). Simulations with a connectionist model forimplicit and explicit memory tasks. Proceedings of the Nineteenth Annual Conference ofthe Cognitive Science Society. 608-613, Mahwah, NJ: Lawrence Erlbaum.

Phaf, R.H., & Wolters, G. (1997). A constructivist and connectionist view on conscious andnonconscious processes. Philosophical Psychology, 10, 287-307.

Phillips, W.A. (1997). Theories of cortical computation. In: M.D. Rugg (Ed.), CognitiveNeuroscience (pp. 11-46). Hove, East Sussex: Psychology Press.

Ratcliff, R. (1990). Connectionist models of recognition memory: Constraints imposed bylearning and forgetting functions. Psychological Review, 97, 285-308.

Ritter, H. (1993). Parameterized self-organizing maps. In S. Gielen and B. Kappen (Eds.),ICANN'93: Proceedings of the International Conference on Artificial Neural Networks,Amsterdam, The Netherlands (pp. 568-575). Berlin: Springer Verlag.

Rumelhart, D.E., & Zipser, D. (1985). Feature discovery by competitive learning. CognitiveScience, 9, 75-112.

Schacter, D.L. (1987). Implicit memory: history and current status. Journal of Experimentalpsychology: Learning, Memory, and Cognition, 13, 501-518.

Szentágothai, J. (1975). The 'module concept' in cerebral cortex architecture. Brain Research,95, 475-496.


38

Tijsseling, A.G. (1998). Connectionist models of categorization: a dynamical approach tocognition. Unpublished PhD thesis, University of Southampton, United Kingdom.

Tulving, E. & Kroll, N. (1995). Novelty assessment in the brain and long-term memoryencoding. Psychonomic Bulletin and Review, 2, 387-390.

Tulving, E., Markowitsch, H.J., Craik, F.I.M., Habib, R., & Houle, S. (1996). Novelty andfamiliarity activations in PET studies of memory encoding and retrieval. Cerebral Cortex, 6,71-79.

Wilson, F.A.W. & Rolls, E.T. (1993). The effects of stimulus novelty and familiarity onneuronal activity in the amygdala of monkeys performing recognition memory tasks.Experimental Brain Research, 93, 367-382.

Documents

Novelty-dependent learning and topological mapping