Upload
uva
View
0
Download
0
Embed Size (px)
Citation preview
Novelty-dependent learning and topological mapping
R. Hans Phaf Paul Den Dulk Adriaan Tijsseling Ed Lebert
Abstract
Unsupervised topological ordering, similar to Kohonen’s (1982) Self-organizing feature
map, was achieved in a connectionist module for competitive learning (a CALM Map) by
internally regulating the learning rate and the size of the active neighborhood on the basis
of input novelty. In this module winner-take-all competition and the 'activity bubble' are
due to graded lateral inhibition between units. It tends to separate representations as far
apart as possible, which leads to interpolation abilities and an absence of catastrophic
interference when the interfering set of patterns forms an interpolated set of the initial
data set. More than the Kohonen maps, these maps provide an opportunity for building
psychologically and neurophysiologically motivated multimodular connectionist models.
As an example, the dual pathway connectionist model for fear conditioning by Armony,
Servan-Schreiber, Cohen, and LeDoux (1997) was rebuilt and extended with CALM
maps. If the detection of novelty enhances memory encoding in a canonical circuit, such
as the CALM map, this could explain the finding of large distributed networks for
novelty detection (e.g. Knight & Scabini, 1998) in the brain.
A self-organising connectionist map 2
2
1. Introduction
Novelty detection is increasingly recognized as a stimulating factor for memory
encoding, both in neurobiological research (e.g. Halgren & Smith, 1987; Knight, 1996;
Eichenbaum, 1999), and in psychological research (e.g. Phaf, 1994;Tulving & Kroll, 1995).
Implementations of this novelty-encoding process in connectionist models have, however,
been rare. Many competitive learning procedures implicitly distinguish novel from familiar
stimuli by the amount of competition they evoke. Familiar stimuli, generally, suffer from less
competition and have more localized representations (Page, 2000) than novel stimuli. The
classical self-organizing map by Kohonen (1982, 1988, 1995) even incorporates a steady
decline of learning rate with repeated presentation of the input patterns (i.e. when they
become more familiar). Familiarity is reflected in this map by an increased ordering of
representations, which should not be disrupted by recoding due to strong weight changes. A
competitive learning procedure which explicitly distinguishes novel from familiar stimuli, but
does not order representations in the same manner as Kohonen’s map, is the CALM module
(Murre, Phaf, & Wolters, 1992). The two approaches were combined in the CALM Map
which uses the novelty detection mechanism and the explicit lateral inhibition of CALM
modules to achieve topological mapping as in Kohonen’s map.
The CALM Map is introduced here, its behaviour is compared to that of CALM
modules, and an illustrative example of a network incorporating a number of CALM maps is
presented. An existing multimodular competitive network model is rebuilt and applied to the
simulation of experimental data. The assumptions underlying CALM Map, particularly
concerning novelty-dependent learning, may not only be justified from a practical point of
view, but may also be used to simulate and explain experimental results in the field of
memory, attention, perception, and even affective processes. Although we have found
CALM Maps quite useful in simulations of widely different behavioural data, the full
applicability will have to show in a prolonged use of the procedure, perhaps much in the
same manner as Kohonen’s map has been developed and shown its value over twenty years.
Instead of implicitly involving novelty in a model, such as in the Kohonen map, it
seems better to follow the developments in psychological theorising (Phaf, 1994; Tulving &
Kroll, 1995) and explicitly incorporate these in a model. In addition, these assumptions may
provide some opportunity to understand neural properties and may help to bridge the gap
between psychology and neurobiology. If, for instance, novelty detection plays a central role
A self-organising connectionist map 3
3
in learning, it would be expected that large distributed networks in the brain would be
involved (e.g. Tulving, Markowitsch, Craik, Habib, & Houle, 1996; Knight, 1996, 1997;
Knight & Scabini, 1998). In this respect there seems to be a two-way interaction between
simulations of artificial neural networks and the study of real neural systems. On the one
hand microscopic neural functions may serve as inspiration for connectionist models and on
the other hand network models may impart meaning on, not yet fully understood, neural
mechanisms.
The CALM procedure, from Categorizing And Learning Module (Murre, Phaf, &
Wolters, 1992), has been proposed as a building block for modular network models. It
develops local representations (see Page, 2000) on specific nodes at a modular level, but semi-
distributed representations at a global network level (which is assumed to consist of many
interconnected modules). CALM implements competitive learning by lateral inhibition
between nodes and it incorporates a psychologically motivated, novelty dependent,
attentional mechanism which leads to a random search for possible representations and
increased learning of these representations (i.e. elaboration learning).
Disadvantages of this approach are that the learned representations do not match the
topology of the input space, and that increasing amounts of overlap between patterns may
severely impair categorisation performance. If a new pattern differs sufficiently from the
already represented patterns, a new representation is selected at random from the remaining
uncommitted nodes (i.e. nodes that do not have a representation). Though it shows some
ability to separate correlated patterns, its performance breaks down for highly correlated
patterns. Particularly in larger modules and with larger patterns, even a large distance between
patterns may not be sufficient for a stable distinct categorisation.
Such a problem is not found in the well-known self-organising procedure of Kohonen
(1982, 1988, 1995). In the initial version of these self-organising feature maps activations of
the representational nodes are determined by the Euclidean distance between the input vector
and the weight vector to the node. The computational procedure, which was only partly
implemented in neural network terms, selects the node with the highest activation together
with a neighbourhood of nodes (the 'activity bubble') for weight change. Neighbourhood size
and learning rate are reduced during successive learning of the patterns. In this manner, similar
patterns will get represented on neighbouring nodes, whereas dissimilar patterns remain far
A self-organising connectionist map 4
4
apart. The preservation of the order in the input space by the representations of patterns is
generally referred to as topological self-organisation.
The CALM procedure provides a state-dependent mechanism for internally adjusting
the learning rate as a function of novelty, whereas these novelty-dependent changes in the
Kohonen map are set by the programmer. Following Murre (1992), we modified CALM to
stretch representations along the module according to the similarity gradient in the input by
introducing a gradient of lateral inhibition within a module. The formation of feature maps on
the basis of explicitly simulated inhibitory dynamics between nodes has previously been
studied by Miikulainen (1991), but this model still needed control of the 'activity bubble'
radius by hand during learning and was also quite sensitive to boundary conditions (see also
Murre, 1992).
The new map adheres to the general principles for interactive activation networks
described by McClelland (1993). He did, however, not present a general learning procedure
for his theoretical framework (called GRAIN, standing for Graded Random Adaptive
Interactive (nonlinear) Network). Although both types of CALM modules have additional
mechanisms not specified by McClelland (e.g. novelty-dependent learning), both modules
seem to qualify as learning procedures within such a framework. Probably due to the
difficulty of independently regulating the different maps in a multimodular network, and of
handling bi-directional connections, to our knowledge, Kohonen maps have not been applied
to this kind of, interactive, multimodular, networks. Kohonen maps, therefore, appear to be
less suitable for the types of interactive networks envisioned by McClelland (1993). Because
CALM Map is intended as a building block for such networks, we will focus here on the
comparison of CALM Maps and CALM modules and consider the Kohonen Maps as a
useful heuristic for improving CALM.
2. Competitive modules
In Kohonen's self-organizing feature map (Kohonen, 1982, 1988, 1995) nodes are
arranged according to some type of 1, 2, or 3 dimensional neighborhood of connectivity (e.g. a
line, grid, or cube). Starting from a random pattern of weights, input patterns are compared to
the weight vector associated with each node using an Euclidean distance measure, resulting in
the selection of a best-matching or "winning" node. Weights on links to nodes that are
A self-organising connectionist map 5
5
neighbors of the winning node are also modified. As a result, similarity between input
patterns will be mapped into proximity of activated nodes, and representations will be forced
in an order depending on the dimension with the largest range of variation in the whole data
set. Small variations tend to be ignored or play only a minor role in the ordering process,
depending on the available representation space.
To obtain stabilization in the Kohonen map during learning, usually two parameters are
regulated externally. First, the neighbourhood size of activated nodes is reduced
monotonously with repeated presentation of the set of input patterns, and second, the
learning parameter is decreased gradually. This regulation requires prior knowledge about the
presentation schedules according to which the neighbourhood and the learning parameter have
to be adjusted. This may, however, be hard to reconcile with the unsupervised character of
learning in self-organising maps. It seems, therefore, desirable to implement processes capable
of automatically self-adjusting global network parameters such as neighbourhood size and
learning rate. CALM seems well suited for this purpose, because it already incorporates a
novelty dependent learning rate, which automatically decreases when competition in the
module levels off (i.e. when patterns become more familiar and have a better match with
stored representations). If, moreover, competition is implemented by a gradient of inhibitory
connections, the size of the winning (i.e. activated) neighbourhood of nodes no longer needs to
be specified externally, but will be dependent on the range of activations in the module. The
neighbourhood will be tuned more finely during learning due to the increasing match between
weight pattern and input pattern. So, both the winner-take-all behaviour and the 'activity
bubble' arise from the incorporation of actual inhibitory connections.
A standard CALM (see Figure 1) is a competitive learning module, in which the
competition process is performed by intramodular interactions between excitatory
Representation nodes (R-nodes) and inhibitory Veto nodes (V-nodes). Nodes with excitatory
and inhibitory effects have been explicitly separated in the module. Every R-node has an
excitatory connection to a single (matched) V-node. The strongly inhibitory weights from V-
nodes to all other (non-matched) R-nodes have an equal value and impose a strong veto effect
on these R-nodes. Incoming signals arrive along modifiable intermodular connections at the R-
nodes. An Arousal node (A-node), receiving connections from both R and V-nodes, weighs
the amount of competition, which serves as a measure of the novelty of the presented
pattern. When an input pattern closely resembles the weight pattern to a particular R-node,
there will be little competition and the total amount of excitation from the R-nodes to the A-
A self-organising connectionist map 6
6
node will be suppressed by the inhibition from the V-nodes. When many R-nodes have
weight patterns that match the input pattern about equally well, there will be much
competition and due to the inhibition between V-nodes, the A-node will receive more
excitation from R-nodes than inhibition from V-nodes. The A-node activates an External node
(E-node), which spreads random activations among the R-nodes, and controls the learning rate
in the module. In the case of much competition, the E-node will be highly active and will
generate relatively large random pulses to prevent potential "deadlocks" in the competition.
This is not necessary, of course, when there is little competition, because a winning pair of R
and V-nodes has already been selected.
V V
R
V
RR
A
E
Low
HighFlat
Strange
AE
Up Cross
Modifiable intermodular connections
Down
Figure 1. A CALM module with node types and connection names. The node types are V-node (Vetonode), R-node (Representation node), A-node (Arousal node), and E-node (External node. Forconvenience, also distinctive names were given to the connections. An excitatory Up weight connectsa R-node to its matched V-node. In the standard CALM module all inhibitory Cross weights (from aV-node to non-matched R-nodes) are equal, and the Down weight (to the matched R-node) issomewhat higher. The A-node receives activations from both R and V-nodes via Low and Highconnections, respectively. The AE weight connects the A-node to the E-node, which sends randomactivations through Strange connections to the R-nodes. Only intermodular connections aremodifiable.
The general equation specifying the activation states (real values between 0 and 1) of
node i at epoch (or iteration number) (t+1) in CALM is:
€
ai ( t +1) = (1− k)a i ( t)+ei
1+ ei[1− (1− k)a i ( t)]
(1)
A self-organising connectionist map 7
7
if the input ei > 0, and
€
ai ( t +1) = (1− k)a i ( t)+ei
1− ei(1− k)ai ( t) (2)
if the input ei < 0 where ei is the match between inputs and the modifiable intermodular
weights determined by the inner product of both vectors, and k is a decay parameter.
In these equations three components may be distinguished. The first component (1 -
k)ai(t), represents autonomous decay, and for ei > 0 the second part ei /1 + ei is (half of) a
sigmoid function between zero and one. The third part of the rule [1 - (1 - k)ai(t)] ensures that
the increase in activation due to net excitatory input approaches the maximum activation
asymptotically. Similarly, for ei < 0, ei /1 - ei squashes the negative excitation (inhibition)
between minus one and zero. The (1 - k)ai(t) component then ensures an asymptotic
approach to the minimum activation value. It should be noted that in CALM, contrary to the
Kohonen map, activations are not determined by an Euclidean distance measure but by a
shunting equation (Equations 1 and 2, see also Grossberg, l973, 1988), where the input term ei
is determined by the more common weighted summation rule.
The modification of the intermodular weight from node j to node i is governed by an
extension of a learning rule published by Grossberg (1982):
€
∆wij (t +1) = µ( t)ai ( t) [k − wij ( t)]a j − Lwij( t) wif a ff ≠ j
∑
(3)
where µ t( )
is a parameter which controls learning speed. The value of this parameter
depends on the activation of the E-node according to:
( ) EE awdt µµ += (4)
where d is a constant with a small value determining base rate learning, wµE is a virtual
weight from E to µ t( ) (from the E-node to the learning parameter), and aE is the activation of
the E-node. The E-node also sends random activation pulses, which are uniformly distributed
over the interval 0,aE t( )[ ] , to all R-nodes in the module. aE t( ) represents the activation of
the E-node at time t. Because all intermodular weights start out at exactly the same value, the
random activations are required to break the symmetry, but are also useful in later phases
A self-organising connectionist map 8
8
when there may be many nearly-matching nodes. A more extensive description of CALM can
be found elsewhere (Murre, 1992; Murre et al., 1992; Phaf, 1994).
CALM incorporates two modes of learning: elaboration learning and activation learning.
Activation learning represents base-line learning (i.e. slow strengthening of existing
associations) on which elaboration learning is superimposed. The elaboration process is
dependent upon the amount of competition among the R-V node pairs. If a pattern is not yet
represented in the module, it will generally elicit much competition, because many nodes are
simultaneously activated by the pattern. This gives rise to a high arousal level at the A-node
and the E-node, yielding an increased learning rate, and relatively large random pulses
facilitating the resolution of competition. A well-established pattern activates its
corresponding node without much competition and only strengthens its representation
through activation learning, which is characterised by a relatively low learning rate. Learning in
CALM, thus, has the effect of reducing the competition with repeated presentation of the
pattern set, whereby elaboration learning is gradually replaced by activation learning.
The implementation of this novelty-dependent modulation of learning was primarily
motivated by a psychological theory of memory and learning (Mandler, l980; see also Murre
et al, 1992; Phaf, 1994), but increasingly seems to receive support from neurobiological
research (Halgren & Smith, 1987; Knight, 1996; Eichenbaum, 1999). Dual process theory (i.e.
elaboration and activation learning) was first used to explain results from recognition
experiments and which was later applied to dissociations between implicit (e.g. threshold
identification, word stem completion) and explicit (e.g. free recall, recognition) memory
performance (Graf and Mandler, 1984; see also Bower, 1996). An example of such a
dissociation is that implicit memory performance is generally preserved in severely
anterograde amnesic patients, whereas explicit memory performance is often completely
absent. A similar dissociation can sometimes be observed in normal subjects when the to-be-
remembered material is presented outside of attention (e.g. in a divided attention task). This
dissociation can be accounted for by assuming that elaboration learning has been impaired in
these patients (and is dependent on attention in normal subjects), but that activation learning
still serves to consolidate existing representations. A simulation of this dissociation has been
performed in a multimodular CALM network (Phaf, 1994) by disabling elaboration learning
(lesioning the connection to the External node). After this lesion, the network lost its ability
to form new representations (in the short time available during single trial presentation), but
still revealed consolidation of existing representations.
A self-organising connectionist map 9
9
3. CALM Map
An early approach (CALSOM; Murre, 1992) to implement a neighborhood of activated
nodes in CALM was to have a linearly decreasing gradient of inhibition with distance from
the inhibiting node. Murre, however, did not obtain maximal separation of representations.
Adjacent input patterns were sometimes represented on the same node and boundary nodes
tended not to be occupied by particular patterns. A further problem was that, though a
topological ordering was achieved, subgroups were sometimes inverted (i.e. 'twists'). CALM
Maps differ from CALSOM due to the incorporation of a convex inhibition gradient (i.e. part
of a Gaussian function), instead of a linear inhibition gradient (or a 'Mexican hat' gradient, see
Miikulainen, 1991). For this gradient it has been shown in terms of Kohonen Maps that,
when the ‘full width at half height’ of the Gaussian equals the number of neurons,
convergence is optimal (Erwin, Obermayer, & Schulten, 1992). Only one-dimensional
topologies (e.g. a line or a ring) are considered here. Though there are ways of circumventing
boundary problems in a line topology (i.e. by reducing the net inhibition to the boundary
nodes), we have chosen to avoid the problem by eliminating boundaries altogether (i.e. in a
ring topology in which the ‘first’ node is a neighbor of the ‘last’ node in a module).
The parameter values (mostly fixed intramodular weight values) were generally the
same as in the standard CALM module (Murre et al., 1992; see also Table 1). Simulations
have shown that these parameters can be varied over large ranges to preserve global behaviour
of CALM Maps and modules. Though this set was chosen after some preliminary
simulations, it cannot be excluded that better values can be obtained. The parameter values
have to obey some global rules. To enable a transition from elaboration to activation learning,
for instance, the inhibitory weights from the V-nodes to the A-node have to be larger in
absolute value than the excitatory weights from the R-nodes to the A-node. Because there is
generally more than one node active in the activity bubble, the weights to the A-node in the
CALM Map had to be adjusted to the new proportions of excitation and inhibition. The
activations of A- and E-nodes were lower in the CALM Maps than in CALM modules, so
that there would still be enough noise to break symmetry but ordered representations would
not be disturbed. To allow for a smooth distribution of representations, furthermore, the
learning rate was reduced. Finally, two old parameters (specifying the inhibition from Veto
A self-organising connectionist map 10
10
nodes to Representation nodes) were replaced by two new parameters for the inhibition
gradient, which had the following form:
€
hij = Ae−
(i − j )2
2σ 2
− B (5)
where hij denotes the inhibitory weight from the j-th V-node to the i-th R-node, A > 0, B
> 0, and σ determines the width (standard deviation) of the inhibition gradient. Note that B >
A, so that all hij values are negative with maximum value A-B (the inhibition to the matched R-
node in CALM) and the minimal values approach B (the cross-weight in CALM). The value
of σ was kept dependent on module size, n, according to the following empirical formula:
€
σ =n
n −1(6)
The values of σ are rounded to the nearest half. Smaller values than those prescribed by
this formula tend to induce frequent twists. The parameters of the Gaussian, furthermore, do
not change during presentation of a pattern set, nor does any other parameter.
weights Description valueup from R-node to matched V-node 0.5A Gaussian inhibition factor 8.8B Gaussian inhibition constant 10.0flat interconnects V-nodes -1.0high connects V-node to A-node -0.7low connects R-node to A-node 0.3AE Connects A-node to E-node 1.0strange connects E-node to R-nodes 0.25 *inter initial value of learning intermodular weights 0.5k decay of activation 0.25L learning competition factor 1.0**K maximum learning weight value 1.0d base rate of learning 0.0001 ***wµE virtual weight from E-node to learning rate 0.0005
* In the conditioning simulations this parameter was set to 0.1.** In the conditioning simulations this parameter was set to 2.0.*** In the conditioning simulations this parameter was set to 0.005.
TABLE 1. Fixed weight values and parameters in CALM Map (see also Murre et al., 1992).
To illustrate the ordering process in the CALM Map, nine patterns (the same set was
used by Murre, 1992), were presented 100 times for 25 iterations each to a module of size 11
(ring topology) from a number of clamped input nodes. Patterns were presented in a fully
randomised order without replacement. Between presentations all activations, but not the
A self-organising connectionist map 11
11
connection weights, were initialised (to zero). In Figure 2 the pattern set and the
categorisation results are shown. After about 30 presentations the patterns are properly
ordered on a one-dimensional scale. Due to the absence of twists and multi-committed nodes,
this represents an improvement on the results of Murre (1992).
13
58
1012
1420
2224
3075
presentations
012345678
91011
nodes
p1: 111100000000
p2: 011110000000
p3: 001111000000
p4: 000111100000
p5: 000011110000
p6: 000001111000
p7: 000000111100
p8: 000000011110
p9: 000000001111
Figure 2. Trace of categorization of pattern set (pl...p9) in a CALM Map of size 11.
As a consequence of the equal starting weights, the activity bubble initially extends over
the full ring for every pattern. For the first presentations during the ordering process, all
representations generally lie close to a randomly selected central node. The activity bubbles
subsequently narrow and split up for the different patterns. The competitive learning rule
reduces the connection weights from inactive connections to nodes within the activity bubble.
Patterns that are disjunct with the pattern causing the bubble, therefore, tend to be
represented outside the activity bubble. Because the weights to nodes farthest away from the
(shrinking) bubble are reduced the least, these nodes will represent patterns that have the least
overlap with the pattern responsible for the activity bubble. The competitive learning
mechanism, in combination with the activation gradient, leads to the 'stretching' property of
CALM Maps (i.e. dissimilar patterns will be represented as far apart as possible
The random activations ensure the selection of a 'central' node and help to break
symmetry when many patterns are represented close together. Because differences in
activations are smallest between neighbouring nodes, the random activations will favor the
separation of representations. In the beginning, the random activations will help the spreading
A self-organising connectionist map 12
12
out of representations, but later on they may disturb the ordering. The amplitude of the
random activations, however, depends on the E-node activation, which in turn depends on the
size of the activity bubble. The internal regulation of the activity bubble thus also reduces the
random activations when they are no longer useful for the organisation process. Eventually,
the representations will settle in an organised state with minimal competition and low E-node
activation, so that ordering is not disrupted by the random activations.
In the Kohonen map only the initial weights are chosen at random and there is no
further (state-dependent) random process at work during processing. Consequently, the
representations start out at random positions without apparent ordering and then form small
clusters from which representations are gradually reordered according to their mutual
relationships. It may be noted that the start from a core representation in CALM Maps (and
CALSOM), as compared to many random representations may lead to a considerable
shortening in separation times relative to the Kohonen map. In a comparison of CALSOM
and Kohonen maps in the classification of wave spectrogram data from the ESA GEOS
satellite mission the former indeed appeared to arrive at a slightly better categorisation in an
appreciably shorter time than the latter (Brückner and Gough, Submitted). The reordering of
an initial random order is thus omitted in CALM Maps (and CALSOM), which may be an
advantage for practical applications of topological self-organisation.
4. Single module simulations
We performed a series of simulations to investigate the stretching process in CALM
Maps and compared its categorization behavior to the standard CALM module with respect
to the size of the module, the overlap, and the Euclidean distance between patterns. In all
simulations in this section the patterns were presented 800 times in random order (without
replacement) to both modules. Results were averaged over five replications. The CALM Map
generally converged upon a single node in less iterations than CALM. In all simulations the
maximum number of iterations per presentation was, therefore, kept constant at 25 iterations
for the CALM Map and at 50 iterations for the standard CALM module.
A self-organising connectionist map 13
13
4.1. Module size
To study the influence of module size on categorization, patterns similar to the
simulation in Section 3 (four input activations were set to 1.0 and the rest to zero), were
presented to modules of size (n) 7, 12, 17, 22, and 27 (with σ 3.0, 3.5, 4.5, 5.0, and 5.5) from
n+2 input nodes. In each pattern set the number of patterns was equal to the number of nodes
(n) minus two. Categorization results are shown in Figure 3. The number of multi-committed
nodes, that is the number of nodes with more than one representation, slightly increased in
larger modules for CALM Map, but increased strongly for CALM. CALM Map is thus
better suited than CALM for separating highly correlated pattern sets, particularly with larger
modules and pattern sets. Moreover, when module size was increased while leaving the
number of patterns constant, CALM Map achieved up to 100% correct classification, which
was not observed in CALM.
0.0
2.0
4.0
6.0
8.0
10.0
12.0
14.0
16.0
18.0
20.0
0 5 10 15 20 25 30
module size
mu
lti-
co
mm
itte
d n
od
es
CALMCALM Map
Figure 3. Number of multi-committed nodes (number of nodes that have representations for multiplepatterns) as a function of module size for CALM and CALM Map.
Further simulations unexpectedly revealed that, if the number of nodes equalled twice
the number of patterns, representations are spread over the entire range of R-nodes such that
committed and uncommitted nodes alternate. Closer inspection of the weight values,
moreover, revealed that uncommitted nodes actually interpolated between the representations
of the neighbouring nodes. Due to the absence of topological stretching, CALM, of course,
could not show such interpolation. Similar interpolation behaviour was already observed in a
modified, continuous, Kohonen map (the Parameterized Self-Organising Map; Ritter, 1993).
According to Ritter (1993), this kind of behaviour has the advantage of ‘learning from very
A self-organising connectionist map 14
14
few examples’ (p. 573). This may be useful in solving what has been called the 'curse of
dimensionality', ‘which is that when the inputs have as many dimensions as natural stimuli
then it is impossible in any realistic time scale to give examples that densely cover the whole
input space, with the consequence that there will be large regions of input space in which the
net has no experience to guide it;’ (p.32, Phillips, 1997). Interpolation also opens the
opportunity for determining the similarity of new patterns to already represented patterns. It
can, thus, be used for a kind of similarity scaling.
Interestingly, an additional benefit of this stretching behaviour and the resulting
interpolation characteristic may be that it also solves the node under-utilisation problem
which troubles many competitive learning procedures (Ahalt, Krishnamurthy, Chen, &
Melton, 1990). In some of these procedures, due to the random initialisation of weights, a
number of nodes may not be used at all to represent a pattern. Here, weights to nodes
neighbouring the winning nodes will also change in the direction of the input pattern. The
stretching property results in uncommitted nodes that will combine weight changes of both
neighbouring representations. So, each node will eventually come to represent a pattern, even
when, as is the case with interpolated representations, the pattern has not actually been
presented.
4.2. Pattern overlap and distance
To investigate the role of overlap we varied the number of shared activations in the data
set. Pattern sets, each containing 11 patterns, were constructed with overlap 3, 4, 5, 6, and 7
(see the pattern set of Figure 2 where the overlap of activated nodes between adjacent
patterns was 3). The size of the module was 13 and the input activations were 0.5. Figure 4
shows that the amount of overlap (expressed in direction cosines) affected performance in
CALM Map and CALM in a comparable way, although the results for CALM Map were
slightly better. As can be deduced from Figure 3a, this advantage is expected to grow,
however, as a function of module size.
A self-organising connectionist map 15
15
0.0
0.5
1.0
1.5
2.0
2.5
3.0
3.5
4.0
4.5
0.70 0.72 0.74 0.76 0.78 0.80 0.82 0.84 0.86 0.88 0.90
overlap (in direction cosines)
mu
lti-
co
mm
itte
d n
od
es
CALM MapCALM
Figure 4. Number of multi-committed nodes as a function of overlap (cos φ ) between patterns forCALM and CALM Map.
In a third simulation, a pattern set with constant overlap (3 activations), but a variable
Euclidean distance between patterns (i.e. the size of the activations), was tested. The
activations in the input ranged from 0.10 to 0.50 in steps of 0.05. The number of patterns
was 11 and the size of the module was 13. Figure 5 shows that varying Euclidean distance has
different effects in CALM and CALM Map. Categorisation performance by CALM was
better at smaller distances, but did not improve much as distance increased. Categorisation by
CALM Maps (25 iterations per presentation) improved with increasing number of
presentations and distances, whereas CALM (50 iterations per presentation) appears to
commit itself after about 200 presentations to a once obtained categorisation. It can, therefore,
be useful in CALM Maps to increase the number of presentations to continue separation.
Though it appears that for a distance below 0.4 separation does not improve as a function of
number of presentations, this has been investigated in an additional simulation of 4000
presentations using patterns with 0.10 activation. It was found that not only the number of
multi-committed nodes had decreased further but also that topological organisation had
improved. For small activations it may still be useful to increase the number of presentations,
but the actual number required may be larger than for high activations.
A self-organising connectionist map 16
16
0.0
0.5
1.0
1.5
2.0
2.5
3.0
3.5
4.0
4.5
5.0
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8
euclidean distance
mu
lti-
co
mm
itte
d n
od
es
100
200
300
400
500
600
700
800
Figure 5a
0.0
0.5
1.0
1.5
2.0
2.5
3.0
3.5
4.0
4.5
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8
euclidean distance
mu
lti-
co
mm
itte
d n
od
es
100
200
300
400
500
600
700
800
Figure 5bFigure 5.(a) Separation as a function of distance between patterns and the number of presentations for CALM.(d) Separation as a function of distance between patterns and the number of presentations for CALMMap.
4.3. Retroactive interference
One of the constraints imposing strong limitations on the psychological plausibility of
many connectionist learning procedures is the extreme loss of memory for old representations
upon presentation of a new pattern set in a sequential learning task (Grossberg, l987; Ratcliff,
1990; Murre, 1992;). This catastrophic retroactive interference seems to be due mainly to the
overlap in representations of the sequentially presented pattern sets. Presentation of a new
set modifies the existing representations, so that representations for the old patterns become
A self-organising connectionist map 17
17
similar to those of the new set. Particularly back-propagation studies, have shown that, after
presentation of the second set (the first set was not presented during the second phase),
memory for the first set is lost and has been mixed with the memory for the second set. In a
competitive learning procedure, which attempts to separate representations, interference
problems should be smaller than with back-propagation, where there is strong overlap due to
the distributed nature of the representations.
0
2
4
6
8
10
12
14
16
18
20
0 20 40 60 80 100
presentations
no
des
p1: 111100000000000000000
p3: 001111000000000000000
p5: 000011110000000000000
p7: 000000111100000000000
p9: 000000001111000000000
p11: 000000000011110000000
p13: 000000000000111100000
p15: 000000000000001111000
p17: 000000000000000011110
Figure 6a
0
2
4
6
8
10
12
14
16
18
20
0 20 40 60 80 100 120 140
presentations
no
de
s
p2: 011110000000000000000
p4: 000111100000000000000
p6: 000001111000000000000
p8: 000000011110000000000
p10: 000000000111100000000
p12: 000000000001111000000
p14: 000000000000011110000
p16: 000000000000000111100
p18: 000000000000000001111
Series10
Series11
Series12
Series13
Series14
Series15
Series16
Series17
Series18
Figure 6bFigure 6. Retroactive interference in CALM Map with overlapping pattern sets.(a) The initial categorization trace of learning set A.(b) Categorization trace of learning set B and results of subsequent testing of both set A and B.
In CALM Maps no strong retroactive interference is expected, because either the
representations of the new pattern set are completely distinct from the old ones, or the new
A self-organising connectionist map 18
18
representations can be assimilated in the ordering of the old patterns and the existing order
will be preserved. To investigate the latter case, we presented two pattern sets of which the
second was an interpolated set of the first. This may represent one of the strongest cases of
overlap between the two sequentially presented sets. The module size was 18 and the number
of presentations was 100 for each set. Figure 6a shows that after learning the first set, the
representations were separated such that committed and uncommitted nodes alternated.
Inspection of the weight values after the last presentation of the first set revealed that
uncommitted nodes actually interpolated between the representations of the neighbouring
nodes. Patterns of the second set were immediately committed to nodes with the interpolated
representations. The tests at the right side of Figure 6b clearly indicate the absence of
catastrophic retroactive interference in this simulation. The stretching property thus reduces
retroactive interference when the pattern sets are highly correlated by separating
representations as far apart as possible.
Retroactive interference was found in standard CALM modules, but differed from
retroactive interference in back-propagation. Training the second set in CALM often had the
effect that representations from the second set replaced those from the first. A further test,
however, showed that the representations of the first set had not mixed with the
representations of the second, but changed place and now occupied previously uncommitted
nodes. After learning the first set, representations were presumably still rather broad, so that
patterns of the second set could also be accommodated on these nodes. By learning the
second set, the representations were recoded and more finely tuned, so that the old patterns
did not fit in anymore. The unsupervised character of CALM, thus, ensured that the
representations of the two pattern sets did not remain mixed, but that, despite some
interference, the patterns kept separate representations.
When, in contrast to the previous simulation, two fully disjunct sets of patterns are
presented to a CALM Map, patterns from the second set would be expected to replace
existing representations. Additional simulations with sequential presentation of pattern sets
(p1...p7) and (p11...pl8), however, surprisingly revealed that, even if the number of nodes
was much larger (we tried three and four times) than in the previous simulation, patterns from
the second set were clustered on only a few nodes, whereas representations for the first set
remained spread out maximally. Retroactive interference was thus replaced by proactive
interference. Under more ecologically valid conditions stimuli will vary on many attribute
dimensions (e.g. Phaf, Van der Heijden, & Hudson, 1990) in which pattern sets may not be
A self-organising connectionist map 19
19
fully distinct, so that in a multimodular network even these sets may obtain separate
representations.
5. Multimodular Networks
A CALM Map is not intended for use as the full network model. A multimodular
network architecture has larger information processing abilities than a single module network
due to the presence of independent parallel processing pathways. Moreover, such
architectures allow for hierarchical systems that perform categorization of stimuli at different
levels of abstraction. Happel and Murre (1994) have derived general design principles of
multi-modular networks by exposing simulated neural networks, of which the structure was
generated by a genetic algorithm, to selection pressures, also applying to actual neural
systems. With this evolution-inspired optimization procedure Happel and Murre (1994) in
fact obtained architectural features that were very similar to features of the visual system.
According to their principle of structural compatibility, the best categorization is obtained
when the (evolutionary prepared) modular structure corresponds to the cluster structure of
the task domain. When, for instance, the input consists of hierarchically clustered patterns
containing smaller subclusters, a coarse categorization in a small module can interactively
facilitate a (simultaneous) more fine-grained categorization in a larger module. The second
principle argues that multiple, parallel, pathways may improve categorization compared to a
single pathway, because the pathway with the best organization will be the fastest to
converge on a suitable representation and so will come to dominate the total organization. The
last principle, the principle of recurrence, maintains that the presence of recurrent connections
between modules may also enhance categorization compared to the situation where no such
connections are present. Both bi-directionally connected modules may interactively benefit
from the gradually increasing differentiation in either module. It should be noted, however,
that these principles were derived with CALM modules and that it was not sure whether
they would also be valid for CALM Maps.
Due to the distribution of (local) representations over multiple modules,
representations may get a more continuous form, taking on different category boundaries in
different modules, and so these networks have a potentially greater discriminatory power
than a single CALM Map module. A further practical advantage of modular networks is that
the scale of the simulations can be enlarged by increasing the number of modules without
A self-organising connectionist map 20
20
changing the size of the constituent modules. In networks that have no restrictions on
connectivity between nodes, the practical costs of increasing scale may become prohibitive
much sooner than in modular networks.
Because CALM Map is best used as a building block for larger networks, more so than
the Kohonen maps, several multimodule network simulations of experimental data have been
performed. In one of these, Tijsseling (1998) modeled categorical perception (Harnad, 1987)
by training a fully recurrent multimodular network consisting of seven modules to
discriminate and categorise Gabor filtered (Gabor, 1946) drawings of lines with varying
orientations. The results of these simulations were later supported by human experimental
data (Pevtzow, Tijsseling, & Harnad, Submitted). In another study (Phaf & Van Immerzeel,
1997), dissociation effects between explicit and implicit human memory performance
(Schacter, 1987; see also Section 2) were simulated in a network implementing the
activation/elaboration account discussed earlier. A third set of simulations of fear conditioning
in a dual-pathway network model will be treated in more detail here. Recently, moreover, the
dual-pathway model was extended to also simulate evaluative conditioning and mere exposure
effects (Grob, 2001).
5.1. A connectionist dual pathway model of fear conditioning
A published connectionist model in which the need for competitive modules forming
topological, or in this case tonotopic, maps is apparent, is the network model by Armony,
Servan-Schreiber, Cohen, and LeDoux (1995), which was inspired by the neurobiological
model of LeDoux (1986, 1996). Both the network and the neurobiological model explicitly
assume a type of multimodular architecture that seems to adhere to the first principle of
Happel and Murre (1994). LeDoux primarily investigated emotions and affective processes
through neurobiological research on animal fear conditioning. In these conditioning
experiments an initially neutral conditioned stimulus (CS) (e.g. a tone of a particular
frequency) is paired with a fear-evoking unconditioned stimulus (US) (e.g. an electric shock).
As a consequence, presentation of the CS without the US also evokes a fear response. The
intensity of the fear response decreases and eventually disappears with repeated presentation
of the CS without the US (i.e. extinction).
A self-organising connectionist map 21
21
LeDoux investigated which pathways and modules were involved in conditioning and
extinction in a series of experiments in which he lesioned a specific brain area of the animal
and tested how this affected conditioning. He found that even when the auditory cortex was
completely ablated, rats could still be conditioned to auditory stimuli (LeDoux, Sakaguchi, &
Reis, 1984). Lesions to the thalamus and the midbrain, however, totally prevented
conditioning. Subsequent tracing techniques revealed a neural pathway leading from the
thalamus to the amygdala. This pathway appeared to be sufficient for conditioning but could
not discriminate between very similar stimuli or between stimuli in different contexts.
Experiments with lesions to this direct pathway but with the cortex left intact, however,
showed that the direct pathway was not necessary for conditioning and that probably also a
parallel, indirect pathway has to be distinguished. The indirect pathway via the cortex is held
responsible for finer discrimination and more extensive processing than the direct pathway.
The dual pathway model of LeDoux (1986, 1996) adheres to the first principle of
Happel and Murre (1994), which was arrived at through evolutionary computational
methods. Because both coarse and fine-grained categories can be distinguished in conditioned
stimuli, fear processing of these stimuli is facilitated by this dual pathway architecture. The
architecture may reflect the biological preparation for activating gross affective categories,
such as fear. The indirect pathway would then provide a finer specification of the affective
processing, at the same time ensuring that the gross affective categorizations are preserved
during further refinement. An example of this specification may be found in context effects on
conditioning. Conditioning, moreover, extinguishes after repeated presentation of the CS
without the US, which seems to be mainly caused by the indirect pathway and not by the
direct pathway. After lesioning the indirect pathway even many presentations of the CS
without the US seem to have no effect on the conditioned response (LeDoux, 1996). Also,
from behavioral experiments it is known that after extinction some trace of the conditioned
stimulus is preserved. This is demonstrated in experiments where the conditioned fear
response returns after extinction. For example, a fear response can be reinstated after
presentation of the US, or it can return spontaneously after a period during which neither CS
nor US is presented (Bouton & Swartzentruber, 1991). Apparently, extinction functions by
the active inhibition via the indirect pathway, of a CS-US association still present in the direct
pathway.
The Armony et al. (1995) connectionist model of the dual pathway architecture had one
input module of 16 nodes, through which the CS input was provided. The model further
A self-organising connectionist map 22
22
consisted of four modules: the amygdala (3 nodes), the cortex (8 nodes) and two sub-
structures within the thalamus: MGm/PIN (8 nodes) and MGv (3 nodes). US input was
provided to the network by directly activating the amygdala and MGm/PIN. The modules in
the network were connected in such a way that CS input could be transmitted along a direct
and an indirect pathway to the amygdala. The CS input module was connected to both MGv
and MGm/PIN. Only MGm/PIN was connected to the amygdala, forming the direct
pathway. The indirect pathway was formed by connections from both MGm/PIN and MGv
to the cortex, and from the cortex to the amygdala. All connections between modules were
unidirectional and all-to-all.
The learning algorithm of the Armony et al. (1995) model was a modification of the
competitive learning algorithm by Rumelhart and Zipser (1985). The most important changes
concerned the inclusion of continuous instead of discrete values for the activations, the
implementation of competition by actual lateral inhibition between nodes in a module (instead
of the direct selection of a winning node), and the adjustment of the learning rules to the
continuous activation values. In these simulations a series of pure tones of contiguous
frequencies (and equal intensities) in an arbitrary scale served as input patterns to MGv and
MGm/PIN. The US was represented as an external binary input to all nodes of MGm/PIN
and amygdala modules, so that an equal amount of activation would be sent to all nodes in
these modules. After the familiarization phase, the specificity of the cell responses was
established by presenting all input patterns (without US) and recording the resulting
activation in each node for each input pattern. Coupling the US to a single pure tone caused
changes in receptive fields in the corresponding modules of the model, which were similar to
those observed experimentally in animals. The total activation of all nodes in the amygdala,
moreover, showed a clear increase for the selected tone, indicating successful conditioning.
5.2. Conditioning simulations
The CALM Map seems suited for rebuilding the Armony et al. model (see Figure 7)
and for replicating their simulations (see also Den Dulk, Rokers, & Phaf, 2000), because many
features that had to be added to the Rumelhart and Zipser (1985) algorithm are already
present in CALM Maps. In the Armony et al. (1995) simulation, however, neighboring
frequencies were not represented systematically on neighboring nodes in the module, because
their competitive learning scheme did not allow for such ordering. A tonotopical ordering (i.e.
A self-organising connectionist map 23
23
a topological ordering of tones), which can be found in the auditory pathways of many
animals, emerges automatically from the learning procedure in CALM Maps. The CALM
Map also provides a more suitable way of administering the unconditioned stimulus than the
Rumelhart and Zipser (1985) type of competitive learning, because of the novelty detection
mechanism in CALM. Novelty is often associated with fear responses, so the novelty
detection in CALM can be seen as one way of evoking fear responses. Direct activation of the
Arousal-node can, therefore, be considered a fear response. We capitalized on this by feeding
the US-input (with activation 1.0) to the A-nodes of the two modules (MGm/PIN and
Amygdala), which also received the US in the Armony et al. (1995) model.
Figure 7: The architecture of the dual pathway model with CALM Maps. Input is given through thebottom input module. All other rectangles represent CALM Maps, with Arousal-nodes and External-nodes indicated externally of the module. Activity bubbles around a winning node are depicted.
Parameters in the model were kept equal to the previous simulation with a few
exceptions (see Table 1). After a set of parameters was found which produces tonotopical
organisation, no effort was spent to adjust them to obtain optimal model behaviour. The
Gaussian σ of the ring CALM Maps, which was established by applying Equation 5, was 3.0
for MGv, 3.0 for Cortex, 2.0 for MGm/PIN, 2.0 for Amygdala. In the first phase of the
simple conditioning experiment, all patterns were familiarised. The input patterns (potential
CSs) represented 15 contiguous frequencies. Each frequency-pattern consisted of two
neighbouring nodes, and had an overlap of one active node with one pattern to either side. The
right- and left-most frequencies only had overlap to one side.
A self-organising connectionist map 24
24
0
1
2
3
4
5
6
7
8
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Frequency
No
de
nu
mb
er in
MG
V
.
Figure 8. Tonotopical organization in MGv. Because the CALM Map has a ‘ring’ structure thestarting node in the map is arbitrary.
In the familiarisation phase all 15 patterns were presented 150 times for 20 iterations (a
cycle of calculating all activations and weights) each. In the conditioning phase Frequency 5
was coupled to the US, thus making this pattern the CS. The US-CS pair was fed to the
network for 10 presentations. The network (after conditioning) was tested five times, and
average values were used as a measure of performance. To examine the effects of conditioning
we compared the receptive fields of individual nodes before and after conditioning. Because
amygdala activity is assumed to result (via the hypothalamus, e.g. LeDoux, 1996) in various
autonomic and endocrine reactions, total amygdala activity can be seen as a measure of
autonomic activity. The summed activation in the amygdala was, therefore, also registered as
a function of the frequencies presented both before and after conditioning.
A self-organising connectionist map 25
25
0
0.1
0.2
0.3
0.4
0.5
0.6
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Frequency
Act
ivat
ion
pre-conditioning
post-conditioning
Figure 9. Pre- and post-conditioning receptive fields in MGm/PIN of the node most responsive to theCS before conditioning.
The CALM Maps produced a good tonotopical ordering in all modules (e.g. see Figure
8). Because all modules had fewer nodes than there were input patterns, a receptive field
generally contained more than one pattern. As a consequence of conditioning the receptive
field of the relevant MGm/PIN node (Figure 9) sharpened and shifted towards the frequency
of the conditioned stimulus (Frequency 5). The receptive fields of the cortex and the
amygdala modules showed similar shifts. The frequency-specific changes occurred only for
nodes in which the CS evoked a non-zero response before conditioning. The three modules
also showed a substantial increase in their response to the CS. There was no observable effect
of conditioning in MGv, because the US does not activate it, either directly or indirectly. The
converging activations of US and CS, could only arrive at the cortex through MGm/PIN. The
change in the receptive field of MGm/PIN, therefore, led to a change in the receptive field of
the cortex. The amygdala received US activation in three ways, direct activation from the US
to its Arousal-node, indirect activation from MGm/PIN to the amygdala, and indirect
activation via the cortex to the amygdala. The architecture of this model is such that the effect
of conditioning converges on the amygdala. After conditioning, the CS produced a higher total
amygdala activation than the other input frequencies. In addition, as the distance of the
frequency from the CS increased, the quasi-autonomic response decreased and was smaller
than before conditioning (Figure 10). These results were similar to the simulation results
found by Armony et al. (1995) and to experimental results obtained with fear conditioning of
animals.
A self-organising connectionist map 26
26
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
0.18
0.2
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Frequency
Tot
al a
myg
dala
act
ivat
ion
pre-conditioning
post-conditioning
Figure 10. Summed activation of all three nodes in the Amygdala as function of input frequency.
The simulations of Armony et al. (1995) were also extended to simulations of latent
inhibition and of extinction. Latent inhibition can be described as the lower susceptibility to
conditioning of a familiar stimulus than of an unfamiliar (i.e. novel) one. The model with
CALM Maps may provide an excellent opportunity for simulating this, because both novelty
and the US lead to fear responses. With relatively novel stimuli, the fear response during
conditioning would be larger than with familiar stimuli. To reflect this difference in experience,
a new familiarisation phase was run in which one half of the frequencies were presented 100
times and the other half 200 times. Stimuli from both groups were alternated. In one condition
we chose a CS which had previously been presented 100 times (Frequency 4) and in the other
condition a CS which had been presented 200 times (Frequency 5).
The total amygdala activation brought about by presenting the conditioned low-familiar
(LF) stimulus were larger (Figure 11a) than when presenting the conditioned high-familiar
(HF) stimulus (Figure 11b). It should be noted that latent inhibition was found despite the
fact that LF stimuli were represented less strongly in the network than HF stimuli. Without
such novelty-dependent elaboration learning, conditioning would probably be smaller for the
LF than for the HF stimulus. A similar novelty detection mechanism is not present in the
Armony et al. (1995) model, which does not provide the opportunity to simulate latent
inhibition of conditioning.
A self-organising connectionist map 27
27
0.00
0.02
0.04
0.06
0.08
0.10
0.12
0.14
0.16
0.18
0.20
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Frequency
Tot
al a
myg
dala
act
ivat
ion
pre-conditioning
post-conditioning
Figure 11a. Total amygdala activation as a function of frequency in the latent inhibition simulation.Frequency 4 was conditioned, after it was familiarized in 100 presentations.
0.00
0.02
0.04
0.06
0.08
0.10
0.12
0.14
0.16
0.18
0.20
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Frequency
Tot
al a
myg
dala
act
ivat
ion
pre-conditioning
post-conditioning
Figure 11b. Total amygdala activation as a function of frequency in the latent inhibition simulation.Frequency 5 was conditioned, after it was familiarized in 200 presentations.
A further extension to the Armony et al. (1995) model consisted of an attempt to
simulate extinction of the conditioned response by assuming that it was caused by
interference due to learning of intervening material (e.g. noises in the environment). For this
purpose we presented all other frequencies together with the CS, but without an US, during
extinction. The other frequencies were presented twice as often as the CS to ensure sufficient
interference. The network state that resulted from conditioning was exposed 15 times (for 20
iterations) to a randomised pattern batch consisting of the CS and two instances of all other
A self-organising connectionist map 28
28
frequencies. For comparison, we performed the same extinction procedure on the network
that resulted only from the familiarisation phase (i.e. before conditioning).
0.00
0.02
0.04
0.06
0.08
0.10
0.12
0.14
0.16
0.18
0.20
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Presentation
Tot
al a
myg
dala
act
ivat
ion
conditioned stimulus
non-conditioned stimulus
Figure 12: Total Amygdala activation by the CS, both when it had and had not been conditioned.
The conditioned frequency showed a decrease in fear response over repeated
presentations (Figure 12), whereas the control stimulus lacked such a decrease. In fact, it
increased slightly. After 15 presentations the activation levels were about equal. Though
extinction can be interpreted as caused by interference, this is not suggested by animal
research. In behavioural experiments on rats a relapse of the fear response can occur in a
number of situations (Bouton & Swartzentruber, 1991). In our model such a relapse is not
possible because the conditioning information was lost permanently by interference. An
adequate simulation of extinction in a network model probably also requires an active top-
down control process, which can be eliminated by lesioning the indirect pathway.
5.3. Lesioning the pathways
To investigate the contributions of the individual pathways to conditioning, we lesioned
the direct and indirect pathways of the network both before and after conditioning. The
lesions to the indirect path were applied by disabling the connections from the cortex module
to the amygdala module. In the direct path the connections between MGm/PIN and amygdala
were disabled. Apart from the lesions, all features of this simulation were identical to the first
simulation. Our model could be conditioned without the cortical pathway (see Figure 13a).
A self-organising connectionist map 29
29
The conditioning effect had about the same size as the effect found with both pathways
intact.
0.00
0.02
0.04
0.06
0.08
0.10
0.12
0.14
0.16
0.18
0.20
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Frequency
Tot
al a
myg
dala
act
ivat
ion
pre-conditioning
post-conditioning
Figure 13a: Total pre- and post-conditioning amygdala activation as function of frequency, afterlesioning the indirect pathway before conditioning.
0.00
0.02
0.04
0.06
0.08
0.10
0.12
0.14
0.16
0.18
0.20
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Frequency
Tot
al a
myg
dala
act
ivat
ion
pre-conditioning
post-conditioning
Figure 13b: Total pre- and post-conditioning activation as a function of frequency, after lesioning thedirect pathway before conditioning.
Lesioning the direct pathway before conditioning resulted in a large decrease in
autonomic response, though a slight conditioning effect remained (see Figure 13b). This seems
at odds with the experimental finding that, without the direct path, conditioning is still
possible. Because there is an additional layer in the indirect pathway, activations transported
to the amygdala through this pathway are attenuated. Conditioning effects may have been
A self-organising connectionist map 30
30
smaller as a result of this attenuation. Contrary to experimental findings, the contribution of
the indirect pathway to conditioning in the network model seems much smaller than the
contribution of the direct pathway.
0.00
0.02
0.04
0.06
0.08
0.10
0.12
0.14
0.16
0.18
0.20
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Frequency
Tot
al a
myg
dala
act
ivat
ion
pre-conditioningpost, direct lesion
post, indirect lesion
Figure 14: Total pre- and post-conditioning amygdala activation as a function of frequency, afterlesioning the indirect pathway or the direct pathway after conditioning.
Lesioning the indirect path after conditioning revealed that almost the entire
conditioning effect remained, whereas it almost completely disappeared after lesioning the
direct path (see Figure 14). This again strengthens the idea that the direct pathway in the
model is more important for conditioning than the indirect pathway. The Armony et al.
(1995) model seems to show similar, unevenly balanced, conditioning effects along the two
pathways (see also Armony, Servan-Schreiber, Cohen, and LeDoux, 1997), but this does not
seem to conform to the underlying biological model.
5.4. Discussion
In sum, the actual competitive learning procedure used in the model does not seem
critical for obtaining these conditioning results. They seem to arise primarily from the
neurobiologically inspired network architecture. An advantage of CALM Maps may be that it
is equipped with a novelty detection mechanism of which the arousal node may be a suitable
site for applying the US. In the standard CALM Maps novelty automatically leads to
activation of the Arousal-node which can be seen as a fear response. The assumption by
Armony et al. (1995) that the US directly activates all nodes in the amygdala and the
A self-organising connectionist map 31
31
MGm/PIN can be avoided in this way. The combined operation of novelty and US, moreover,
leads to a latent inhibition effect, which probably does not occur in the Armony et al. (1995)
model. The replacement of competitive learning in the Armony et al. (1995) model by CALM
Maps showed a number of other advantages. Due to the gradient inhibition, a tonotopic
ordering of input frequencies arose automatically in all modules. Another advantage may be
that the network can easily be extended to include recurrent connections between modules.
From these simulations some suggestions for improvements can be made, which may
lead to a better correspondence of the computational model to the conceptual model. A clear
problem is that the contribution to conditioning of the indirect pathway seems much smaller
than of the direct pathway. The additional layer between input and amygdala in the indirect
pathway attenuates its activation. This renders it difficult to judge the conditioning effects
along the indirect pathway. If the cortex activation would be larger, the indirect pathway
might well show a finer discrimination of stimuli than the direct pathway. In subsequent
simulations (Pallamin, Den Dulk, & Phaf, Unpublished results), we multiplied the weights
from MGv to the cortex by two, resulting in equally strong conditioning effects in both
pathways. Another way to obtain larger activations in the cortex may be to incorporate
bidirectional connections between thalamus and cortex, so that positive feedback enhances the
representations in the cortex. The theoretical model (e.g. LeDoux, 1996), at least, assumes
bidirectional (but not symmetric) connections between amygdala and cortex.
Although the processes in the two pathways of our model differ in speed (the indirect
pathway is slower and weaker), and capacity (the indirect pathway has more nodes), there is
no further distinction between the explicit functions of either pathway. LeDoux (1986, 1996),
however, has suggested that the direct pathway may be more important for fast reactions in
unexpected dangerous situations, whereas the indirect path may be involved in higher order
behaviour, such as control processes. A better way to model extinction of conditioning would
probably be to incorporate such control processes. Extinction appears to arise from the
learning of regulatory control in the indirect pathway, but not from interference per se.
Appropriate modeling of the function of the indirect pathway would require implementation
of regulatory control. Though there are few ideas on how to implement (sequential) control
processes in a connectionist framework, Sequentially Recurrent Networks (SRNs), which also
show a working memory function (see Phaf, Mul, & Wolters, 1994; Phaf, & Wolters, 1997),
may provide an opportunity for learning and executing sequential operations on items
activated in working memory. The disruption of (the influence of) these control processes
A self-organising connectionist map 32
32
after extinction, for instance by lesioning the indirect pathway, could then restore
conditioning.
In a fuller implementation of the dual pathway model it may also be possible to
simulate experimental results with human subjects in the field of emotion (e.g. LeDoux, 1996).
Affective priming, for instance, is the remarkable phenomenon that affective stimuli, such as
faces with emotional expressions (both positive and negative) may have a larger influence on
the evaluation of neutral stimuli (e.g. Chinese ideographs) when the affective prime is not
consciously perceived than when it is (Murphy & Zajonc, 1993). There are even indications
that the priming effect reverses in conscious conditions. This is at odds with the more often
found pattern of non-affective priming where conscious influences are, generally, larger than
non-conscious ones. Such results may emphasise the importance of emotions (and emotion
research) for human behaviour, because it indicates that the human organism has been
evolutionary prepared to perform direct emotional reactions, on top of which a good deal of
regulatory control has developed.
6. Conclusion
CALM Maps show some practical improvements above both the classic version of
Kohonen's self-organizing feature map and the standard CALM module. In contrast to
Kohonen's map (Kohonen, 1982, 1988), it needs no external regulating mechanisms, but is
capable of automatically adjusting its learning rate on the basis of the degree of novelty
detected, and shrinks its 'activity-bubble' without adjusting the inhibitory weights.
Activations in CALM Map are determined by a weighted sum rule instead of by an Euclidean
distance measure. Kohonen (1993, 1995) has however, also provided alternative
implementations and neurobiological justifications of his self-organizing map. These
adjustments are probably better justified neurobiologically than the practical 'shortcuts'
Kohonen took in his early version of the map.
Presumably due to the difficulty of regulating the different maps and of handling
bidirectional connections, Kohonen maps have not been applied to the kind of interactive,
multimodular, networks (McClelland, 1993) that CALM has been designed for. It is difficult
to see how the Kohonen map could be used in the multimodular network applications
discussed earlier. To obtain a shift in receptive fields due to conditioning with Kohonen maps,
A self-organising connectionist map 33
33
for instance, would require changes in the winner-take-all function and the learning rule, which
would make the Kohonen map more similar to the continuous form of competitive learning of
Armony et al. (1995) and to the present Maps. CALM Map can, therefore, also be seen as a
means of making a Kohonen type map fit for incorporation in multimodular networks. A clear
limitation of CALM Map is that it can only deal with unidimensional topologies, whereas the
Kohonen map can make multidimensional orderings. There are, however, indications that
multimodular architectures of unidimensional maps may be used to replace single
multidimensional Kohonen maps. A parallel arrangement of two CALM Maps could, in
principle, cover a two-dimensional input space. It should of course be acknowledged that the
Kohonen Map has proved itself in many useful applications (e.g. Kohonen, 1995) and that
there is much more insight in the computational abilities of the Kohonen Map than of the
CALM Map. Though Kohonen maps and CALM Maps aim at slightly different
applications, a fair comparison of both types of maps can only be made after much additional
research on CALM Map.
In comparison to a standard CALM module, categorisation is improved in CALM
Maps for correlated patterns due to the stretching process which continuously tries to
separate representations as far apart as possible. A nice feature of this process is the
interpolation of intermediary representations on uncommitted nodes. The stretching and
interpolation guarantees that every node will eventually come to represent some pattern even
when this pattern has not actually been presented. It thus solves the underutilisation problem
that troubles many competitive learning procedures (Ahalt et al., 1990). The introduction of
topological ordering in a competitive learning procedure seems to enhance the ability to
organise and distinguish correlated patterns without supervision, particularly with highly
correlated patterns or large modules. With overlapping pattern sets no catastrophic
interference is found in CALM Maps. With distinct pattern sets, however, proactive
interference is found.
These simulations provide a number of functional arguments for CALM Maps, but
there are also good psychological (e.g. McClelland, 1993) and neurobiological (e.g.
Szentágothai, 1975) considerations for assuming some elementary canonical neuronal circuit
with self-organising abilities. Though the CALM Map probably also is too simple a model
for the canonical circuit, there appear to be similarities to the properties of neural hardware.
In contrast to procedures applying global mathematical procedures, which, for instance, use
information that is not available locally (or even at that particular time), the interactions are
A self-organising connectionist map 34
34
local here and can in principle take place in the neural system. It should be noted that only the
(state-dependent) variable learning rate violates the locality of interactions. Such a mechanism
could be implemented, however, in the neural system by a neuromodulator which is produced
locally and only affects a limited network region (see also Kohonen, 1993). The replacement
of a winner-take-all rule by actual inhibitory connections (requiring also continuous
activations) forms part of the objective to achieve more neurobiological plausibility. A nice
aspect of the graded inhibition is that it takes care of both the winner-take-all function and the
'activity bubble'. Boundary problems have been solved by assuming a ring topology. Such
topologies may be neurally plausible if it is assumed that the elementary modules (possibly
the columns, e.g. Szentàgothai, 1975) also have a circular shape (e.g. Bonhoeffer and Grinvald,
1991) and that the growth of connections is such that inhibition to nearby cells is weaker than
to more distant cells within the module.
If novelty detection forms a central feature of a canonical learning module, large
distributed brain areas should be involved in it. Scalp and intracranial event related brain
potential recordings, lesion data, and neuroimaging results have indeed implicated prefrontal,
tempero-parietal, and limbic regions in novelty detection (e.g. Knight, 1996, 1997; Knight &
Scabini, 1998). The fear conditioning model of Section 5, in fact, incorporated novelty
detection in all, supposedly cortical and subcortical, modules, except in the input module. It
would almost appear that the present modelling approach would postulate in more brain areas
than where neural research has actually found it. The paradox can be resolved by assuming
that these neurophysiological studies do not record the highly distributed novelty detection
process itself, but only the summed activity in a number of brain areas specialised in the
functional consequences of novelty detection. There are indications that the prefrontal cortex
is the first large region to be activated by novelty and is primarily responsible for the
direction of attention to the novel stimuli (Daffner et al., 2000). Subsequently, also amygdala,
hippocampal, and parahippocampal regions are activated which seem to have a modulatory
role in memory encoding (Halgren & Smith, 1987; Wilson & Rolls, 1993; Tulving & Kroll,
1995; Tulving et al., 1996; Eichenbaum, 1999). The hypothesis of the distributed nature of
novelty detection on the one hand, and the more localised functional consequences of detected
novelty on the other hand, may be an example of how connectionist modelling may help to
impart meaning on, not yet fully understood, neural mechanisms.
A self-organising connectionist map 35
35
Author notes:
Correspondence should be addressed to Dr R. Hans Phaf, Psychonomics Department,
University of Amsterdam, Roetersstraat 15, 1018 WB Amsterdam, The Netherlands
([email protected]). We thank Cees Van Leeuwen, Jaap Murre, Henk Tijssen,
and Bas Rokers for their help during various stages of this work. We are also grateful to two
anonymous referees for helpful comments.
References
Armony, J.L., Servan-Schreiber, D., Cohen, J.D., & LeDoux, J.E. (1995). An anatomicallyconstrained neural network model of fear conditioning. Behavioral Neuroscience, 109, 1-12.
Armony, J.L., Servan-Schreiber, D., Cohen, J.D., & LeDoux, J.E. (1997). Computationalmodeling of emotion: explorations through the anatomy and physiology of fearconditioning. Trends in Cognitive Science, 1, 28-34.
Ahalt, S.C., Krishnamurthy, A.K., Chen, P., & Melton, D.E. (1990). Competitive learningalgorithms for vector quantization. Neural Networks, 3, 277-290.
Bonhoeffer, T. & Grinvald, A. (1991). Iso-orientation domains in cat visual cortex arearranged in pinwheel-like patterns. Nature, 353, 429-431.
Bouton, M.E., & Swartzentruber, D. (1991). Sources of relapse after extinction in Pavlovianand instrumental learning. Clinical Psychology Review, 11, 123-140.
Bower, G.H. (1996). Reactivating a reactivation theory of implicit memory. Consciousnessand Cognition, 5, 27-72.
Brückner, J.R. & Gough, M.P. (Submitted). Comparison of CALSOM and SOM neuralnetworks for space data pattern recognition.
Daffner, K.R., Mesulam, M.M., Scinto, L.F.M., Acar, D., Calvo, V., Faust, R., Chabrerie, A.,Kennedy, B., & Holcomb, P. (2000). The central role of the prefrontal cortex in directingattention to novel events. Brain, 123, 927-939.
Den Dulk, P., Rokers, B., & Phaf R. H. (1998). Connectionist simulations with a dual routemodel of fear conditioning. In B. Kokinov (Ed.), Perspectives on Cognitive Science, Vol. 4,(pp. 102-112). Sofia: New Bulgarian University Press.
Eichenbaum, H. (1999). The hippocampus: The shock of the new. Current Biology, 9, R482-R484.
Erwin, E., Obermayer, K., & Schulten, K. (1992). Self-organizing maps: Stationary states,metastability and convergence rate. Biological Cybernetics, 67, 47-55.
Gabor, D. (1946) Theory of communications. Journal of the Institute of ElectricalEngineering. 93, 429-457
A self-organising connectionist map 36
36
Graf, P., & Mandler, G. (1984). Activation makes words more accessible, but not necessarilymore retrievable. Journal of Verbal Learning and Verbal Behavior, 23, 553-568.
Grob, J. (2001). A neural network model for evaluative conditioning and mere exposure.Unpublished Master’s thesis. Maastricht University, the Netherlands.
Grossberg, S. (1973). Contour enhancement, short term memory, and constancies inreverberating neural networks. Studies in Applied Mathematics, 52, 217-257.
Grossberg, S. (1982). Studies of mind and brain: Neural principles of learning, perception,development, cognition, and motor control. Boston, MA: Reidel Press.
Grossberg, S. (1987). Competitive learning: From interactive activation to adaptive resonance.Cognitive Science, 11, 23-63.
Grossberg, S. (1988). Nonlinear neural networks: principles, mechanisms, and architectures.Neural Networks, 1, 17-61.
Halgren, E. & Smith, M.E. (1987). Cognitive evoked potentials as modulatory processes inhuman memory formation and retrieval. Human Neurobiology, 6, 129-139.
Happel, B.L.M. & Murre, J.M.J. (1994). Design and evolution of modular neural networkarchitectures. Neural Networks, 7, 985-1004.
Harnad, S. (Ed.) (1987). Categorical perception: The groundwork of cognition. Cambridge,UK: Cambridge University Press.
Knight, R.T. (1996). Contribution of human hippocampal region to novelty detection.Nature, 383, 256-259.
Knight, R.T. (1997). Distributed cortical network for visual stimulus detection. Journal ofCognitive Neuroscience, 9, 75-91.
Knight, R.T. & Scabini, D.L. (1998). Anatomic bases of event-related potentials and theirrelationship to novelty detection in humans. Journal of Clinical Neurophysiology, 15, 3-13.
Kohonen, T. (1982). Self-organized formation of topologically correct feature maps.Biological Cybernetics, 43, 59-69.
Kohonen, T. (1988). Self-organization and associative memory (2nd ed.). Berlin: SpringerVerlag.
Kohonen, T. (1993). Physiological interpretation of the self-organizing map algorithm. NeuralNetworks, 6, 895-905.
Kohonen, T. (1995). Self-organizing maps. Berlin: Springer Verlag.
LeDoux, J.E. (1986). Sensory systems and emotion: A model of affective processing.Integrative Psychiatry, 4, 237-248.
LeDoux, J.E. (1996). The Emotional Brain. New York: Simon & Schuster.
LeDoux, J.E., Sakaguchi, A., & Reis, D.J. (1984). Subcortical efferent projections of themedial geniculate nucleus mediate emotional responses conditioned by acoustic stimuli.Journal of Neuroscience, 4, 683-698.
Mandler, G. (1980). Recognizing the judgment of previous occurrence. Psychological Review,87, 252-271.
McClelland, J.L. (1993). Toward a theory of information processing in graded, random, andinteractive networks. In D.E. Meyer and S. Kornblum (Eds.), Attention and Performance
A self-organising connectionist map 37
37
XIV: Synergies in Experimenal Psychology, Artificial Intelligence, and CognitiveNeuroscience (pp. 655-688). Cambridge, MA: MIT Press.
Miikulainen, R. (1991). Self-organizing process based on lateral inhibition and synapticresource distribution. In T. Kohonen, K. Mäkisara, O. Simula, and J. Kangas (Eds.),Artificial Neural Networks (pp. 415-420). Amsterdam: Elsevier Science Publishers.
Murphy, S.T., & Zajonc, R.B. (1993). Affect, cognition, and awareness: Affective primingwith optimal and suboptimal stimulus exposures. Journal of Personality and SocialPsychology, 64, 723-739.
Murre, J.M.J. (1992). Learning and categorization in modular neural networks. HemelHempstead, U.K.: Harvester Wheatsheaf.
Murre, J.M.J., Phaf, R.H., & Wolters, G. (1992). CALM: Categorizing and learning module.Neural Networks, 5, 55-82.
Page, M. (2000). Connectionist modelling in psychology: A localist manifesto. Behavioral andBrain Sciences, 23, 443-512.
Pevtzow, R., Tijsseling, A., & Harnad, H. (Submitted). Dimensional attention effects inhumans and neural nets.
Phaf, R.H. (1994). Learning in natural and connectionist systems: Experiments and a model.Dordrecht: Kluwer Academic Publishers.
Phaf, R.H., Mul, N.H., & Wolters, G. (1994). A connectionist view on dissociations. In C.Umiltà & M. Moscovitch (Eds.), Attention and performance XV (pp. 725-751).Cambridge, MA: MIT Press.
Phaf, R.H., Van Der Heijden, A.H.C., & Hudson, P.T.W. (1990). SLAM: A connectionistmodel for attention in visual selection tasks. Cognitive Psychology, 22, 273-341.
Phaf, R.H., & Van Immerzeel, M.S.A. (1997). Simulations with a connectionist model forimplicit and explicit memory tasks. Proceedings of the Nineteenth Annual Conference ofthe Cognitive Science Society. 608-613, Mahwah, NJ: Lawrence Erlbaum.
Phaf, R.H., & Wolters, G. (1997). A constructivist and connectionist view on conscious andnonconscious processes. Philosophical Psychology, 10, 287-307.
Phillips, W.A. (1997). Theories of cortical computation. In: M.D. Rugg (Ed.), CognitiveNeuroscience (pp. 11-46). Hove, East Sussex: Psychology Press.
Ratcliff, R. (1990). Connectionist models of recognition memory: Constraints imposed bylearning and forgetting functions. Psychological Review, 97, 285-308.
Ritter, H. (1993). Parameterized self-organizing maps. In S. Gielen and B. Kappen (Eds.),ICANN'93: Proceedings of the International Conference on Artificial Neural Networks,Amsterdam, The Netherlands (pp. 568-575). Berlin: Springer Verlag.
Rumelhart, D.E., & Zipser, D. (1985). Feature discovery by competitive learning. CognitiveScience, 9, 75-112.
Schacter, D.L. (1987). Implicit memory: history and current status. Journal of Experimentalpsychology: Learning, Memory, and Cognition, 13, 501-518.
Szentágothai, J. (1975). The 'module concept' in cerebral cortex architecture. Brain Research,95, 475-496.
A self-organising connectionist map 38
38
Tijsseling, A.G. (1998). Connectionist models of categorization: a dynamical approach tocognition. Unpublished PhD thesis, University of Southampton, United Kingdom.
Tulving, E. & Kroll, N. (1995). Novelty assessment in the brain and long-term memoryencoding. Psychonomic Bulletin and Review, 2, 387-390.
Tulving, E., Markowitsch, H.J., Craik, F.I.M., Habib, R., & Houle, S. (1996). Novelty andfamiliarity activations in PET studies of memory encoding and retrieval. Cerebral Cortex, 6,71-79.
Wilson, F.A.W. & Rolls, E.T. (1993). The effects of stimulus novelty and familiarity onneuronal activity in the amygdala of monkeys performing recognition memory tasks.Experimental Brain Research, 93, 367-382.