12
T H E U N I V E R S I T Y O F E D I N B U R G H Division of Informatics, University of Edinburgh Institute of Perception, Action and Behaviour Attention and Social Situatedness for Skill Acquisition by Yuval Marom, Gillian Hayes Informatics Research Report EDI-INF-RR-0069 Division of Informatics September 2001 http://www.informatics.ed.ac.uk/

Division of Informatics, University of Edinburgh · 2002-02-21 · T H E U N I V E R S I T Y O F E D I N B U R G H Division of Informatics, University of Edinburgh Institute of Perception,

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Division of Informatics, University of Edinburgh · 2002-02-21 · T H E U N I V E R S I T Y O F E D I N B U R G H Division of Informatics, University of Edinburgh Institute of Perception,

TH

E

U N I V E RS

IT

Y

OF

ED I N B U

RG

H

Division of Informatics, University of Edinburgh

Institute of Perception, Action and Behaviour

Attention and Social Situatedness for Skill Acquisition

by

Yuval Marom, Gillian Hayes

Informatics Research Report EDI-INF-RR-0069

Division of Informatics September 2001http://www.informatics.ed.ac.uk/

Page 2: Division of Informatics, University of Edinburgh · 2002-02-21 · T H E U N I V E R S I T Y O F E D I N B U R G H Division of Informatics, University of Edinburgh Institute of Perception,

Attention and Social Situatedness for Skill Acquisition

Yuval Marom, Gillian Hayes

Informatics Research Report EDI-INF-RR-0069

DIVISION of INFORMATICSInstitute of Perception, Action and Behaviour

September 2001

In Proceedings of the First International Workshop on Epigenetic Robotics: Modeling Cognitive Develop-ment in Robotic Systems, Lund, Sweden

Abstract :We present an attention system that models the dynamics that occur in memory in response to stimuli, which

includes habituation, novelty detection, and forgetting. We demonstrate how such an attention system can be used as atrigger for learning perception-action mappings. We discuss the value of social situatedness in the form demonstrator-learner interactions, and show results from both simulations and robot-human experiments of a simple wall-followingtask.

Keywords :

Copyright c 2002 by The University of Edinburgh. All Rights Reserved

The authors and the University of Edinburgh retain the right to reproduce and publish this paper for non-commercialpurposes.

Permission is granted for this report to be reproduced by others for non-commercial purposes as long as this copy-right notice is reprinted in full in any reproduction. Applications to make other use of the material should be addressedin the first instance to Copyright Permissions, Division of Informatics, The University of Edinburgh, 80 South Bridge,Edinburgh EH1 1HN, Scotland.

Page 3: Division of Informatics, University of Edinburgh · 2002-02-21 · T H E U N I V E R S I T Y O F E D I N B U R G H Division of Informatics, University of Edinburgh Institute of Perception,

Attention and Social Situatedness for Skill Acquisition

Yuval Marom and Gillian M. Hayes

Division of Informatics, The University of Edinburgh

5 Forrest Hill, Edinburgh EH1 2QL, UK

fyuvalm, [email protected]

Abstract

We present an attention system that models the dynam-ics that occur in memory in response to stimuli, whichincludes habituation, novelty detection, and forgetting.We demonstrate how such an attention system can beused as a trigger for learning perception-action mappings.We discuss the value of social situatedness in the formdemonstrator-learner interactions, and show results fromboth simulations and robot-human experiments of a sim-ple wall-following task.

1 Introduction

Attention

In our approach to attention, we are mainly inspiredby the Habituation Hypothesis of Selective Attention(Cowan, 1988), which claims that rather than atten-tion being a simple �ltering mechanism that selectscertain inputs and disposes of others, it is a systemthat continually inspects all its input channels, com-pares them to descriptions stored in memory, habitu-ates to familiar stimuli, and has the ability to disha-bituate if needed.

The role of habituation is to inhibit what is knownas the Orienting Response, which is a combinationof neural, physiological, and behavioural changesthat an organism undergoes when a novel or signi�-cant stimulus is detected (Sokolov, 1963; Kahneman,1973). What is interesting is how the orienting re-sponse is reinstated (dishabituation), and this canoccur due to a number of factors (Balkenius, 2000).The work we report here directly models two of thesefactors: the passage of time (forgetting), and the pre-sentation of a novel stimulus.

Attention in general involves orienting cognitiveresources to target locations likely to improve signaldetection (Posner et al., 1980). In most of the com-putational implementations of attention the targetlocations are spatial in nature, and usually involvethe computation of a saliency map, where spatiallocations compete in a winner-take-all manner (forexample (Itti and Koch, 2001)).

We are not concerned with spatial attention, i.e.orienting to salient spatial locations where a loca-tion's saliency is calculated with respect to its neigh-bouring spatial locations. Rather, we are looking at

what could be thought of as temporal attention: ori-enting to salient perceptual instances, where saliencyis calculated with respect to previous experiences.We do not deal with spatial selection | our systemhas to deal with all the perceptual sensors of therobot.

In other words, rather than asking where are theinteresting locations given a snap-shot of the envi-ronment, we are asking when should a snap-shot betaken in the �rst place.

Learning in a social context

We believe that by placing a robot in a social con-text, one can achieve an implicit transfer of informa-tion between an experienced demonstrator (humanor robot), and the learner robot. The social con-text provides a form of joint attention, where the twoagents can share similar experiences. There are manyphenomena that make up the �eld of Social Learn-ing, and exact de�nitions and ideas vary between re-searchers (for good overviews, see Galef (1988) andWhiten (2000)). The one that describes our workbest is Stimulus Enhancement (Spence, 1937), wherethe demonstrator merely acts in ways to increase thechances that the observer perceives given events |no information about goals is transmitted.

There are di�erent forms of interactions possiblein this demonstrator-learner scenario. The simplestone is when the learner follows the demonstrator,copying its actions | in this case the demonstratorwants the learner to go through certain sensori-motorexperiences relevant to some task. This paper, andmost of our work so far, deals with this kind of inter-action. However, we believe the social context canalso provide external stimulus rewards, which couldbe used as another factor for dishabituation (Balke-nius, 2000), and we are currently investigating thisidea (see more on this in Section 4).

Physical and social situatedness

By being situated in a physical environment, a robotlearner builds up representations of its perceptualhistory, habituating to what it sees often; it beginsto ignore certain information based on the propertiesof the particular environment it is in. This governs

Page 4: Division of Informatics, University of Edinburgh · 2002-02-21 · T H E U N I V E R S I T Y O F E D I N B U R G H Division of Informatics, University of Edinburgh Institute of Perception,

how it reacts to stimuli, and further | how it boot-straps new sensori-motor skills from existing ones.Being socially situated allows the robot to do theabove more eÆciently and with more relevance to aparticular task, by implicitly utilising the knowledgeof another agent in the environment. The particu-lar environment and demonstrator play an importantrole here: if any of those were to change, the learner'smemory and task capabilities would develop di�er-ently.

The next section describes a computational im-plementation of an attention system with habitua-tion. Section 3 presents an example of how attentioncan be used to trigger learning necessary for the ac-quisition of new skills. Section 4 brie y discussesthe value of the social context, and is followed by aconcluding section.

2 Computational Framework for

Attention

As mentioned in the previous section, by `atten-tion' we are referring to the interaction of struc-tures in memory in response to stimuli, rather thanone speci�c mechanism. A computational modelshould re ect how perceptual information activatesstored structures for comparisons; how structures areadded, updated, and deleted; and how structures ha-bituate to familiar stimuli, but are able to reinstatethe orientation response if required.

bottom-updishabituation/

orientation

higherlevel

processing

Stimulus

attentionFocus of

memoryupdate

comparisonsactivation &

Memory

Figure 1: The attention model.

In our model, inspired by Cowan's model of at-tention (Cowan, 1988) and depicted in Figure 1, theactivation and comparisons of structures in memoryoccur outside of attention, in a passive, automaticmanner. When the orienting response is reinstated(due to novelty or forgetting), the information goes

0

0.2

0.4

0.6

0.8

1

habi

tuat

ion

valu

e

time

h2 = 0.1

h1 = 0.7

Figure 2: Habituation measure.

through to the focus of attention, which can causethe creation of new structures in memory, and trig-ger some higher-level processing, such as learning.

Several unsupervised learning approaches aresuitable for modelling these kinds of dynamics inmemory. We believe the Self Organising FeatureMap (SOFM) is an appropriate tool, and it has beenproven to be very useful in robotic implementations(Nehmzow, 1999). The SOFM attempts to cover thesensory input space with a network of nodes, andedges connecting neighbouring nodes determined bysome distance measure, in our case an Euclidean dis-tance. This has the e�ect of preserving the topologyinherent in the space. We are interested in a vari-ation of the SOFM, where structures (nodes in thenetwork) grow from experience as required, ratherthan being speci�ed a-priori.

We have adopted and suited to our purposes analgorithm developed byMarsland et al. (2001), whichincorporates notions of habituation, novelty detec-tion, and forgetting. This implementation re ectsour biologically-inspired view of attention (as de-scribed in Section 1) only in the way that nodes ofthe SOFM react to stimuli; the notion of edges in theSOFM does not explicitly stem from any biologicalmotivations, but rather is an inherent part of the al-gorithm, necessary for maintaining the relationshipsbetween nodes.

Habituation

The main asset of this algorithm is that it keeps a ha-bituation measure for each node in the SOFM. Thismeasure gives an indication of familiarity, i.e. thefrequency of that node's activation, which providesa useful heuristic. Each time a node is active, itshabituation value decreases exponentially, as shownin Figure 2, according to:

�dy(t)

dt= �[y0 � y(t)]� 1 (1)

where y is the current habituation value, y0 is the ini-tial habituation value, � determines the habituation

Page 5: Division of Informatics, University of Edinburgh · 2002-02-21 · T H E U N I V E R S I T Y O F E D I N B U R G H Division of Informatics, University of Edinburgh Institute of Perception,

new

input

winner

SOFM

Figure 3: The SOFM network. Each node in the net-work has a habituation value that decreases wheneverthat node is active, or it is connected to an active node.A node is active when it matches the current input best.A familiar input moves the nodes towards it, whereas anovel one initiates the creation of a new node. Topologyis preserved through the edges connecting the nodes.

rate, and � determines the habituation asymptote.This equation is used by Marsland et al. (2001), andis biologically inspired.

In order to eliminate the need to tune many pa-rameters, we set up a typical scenario, as depictedin Figure 2: habituation decreases from 1 to 0 (y0always 1), and always converges on the same asymp-tote (�x the value of �, in our case to 1:05). Theonly remaining parameter is the `time' parameter � ,and will depend on the time-scale of the particularapplication. Further, in addition to the active node,its neighbours also habituate, but slower, and there-fore use a di�erent (higher) value of � (we have usedvalues in the range 50{600, depending on the appli-cation).

For the algorithm presented below, two furtherparameters are required: a `minimal' habituationthreshold (h1 in Figure 2), used to determine if astimulus is completely unfamiliar to a node, and a`full' habituation threshold (h2), used to determinedif a stimulus is very familiar.

By �xing these thresholds (h1 = 0:7 and h2 =0:1), the only free parameter is still the time pa-rameter, � , and we can use it to control how fast astimulus becomes more and more familiar. We canleave the remaining parameters �xed, and thus easilyadapt the algorithm below to di�erent applications.

The algorithm

Below is a general description of the algorithm, seealso Figure 3. Note that the nodes and edges of theSOFM form the the memory module of Figure 1.

� For each input, decide which of the existingnodes in the network best matches it, in whichcase that node `�res' and is referred to as the`winning' node.

� Decide whether the input matches the winningnode well. This is where novelty detection oc-curs: a threshold (on the Euclidean distance) isused to judge similarities.

� If the input matches the winning node well, thewinning node and its neighbours habituate, andmove towards the input by some fraction of thedistance between them (update rate).

� If the input doesn't match the winning nodewell, it is potentially novel | the habituationvalue of the node is used to decide whether it isnovel or not, as follows:

{ if the node has only recently been added(its habituation value is higher than h1),then it is still being positioned in the inputspace, so we don't add a new node, butrather update it as described above.

{ otherwise, the node has �red a number oftimes, and has probably settled in the rightplace of the input space, so a mismatchmeans novelty is detected, and a new nodeis needed; we insert it half-way between theinput and the winning node (see Figure 3).

� If a node is completely habituated (its habitua-tion value is less than h2), we `freeze' the node:the node does not move from where it is, andcannot be deleted. Once a node has been frozenfor a speci�c length of time, we `un-freeze' it bysetting its habituation value to half the startingvalue, thus introducing `forgetting'.

� Finally, a brief note about the construction ofedges. As well as the best-matching node, thealgorithm �nds the second best-matching node.These two nodes are then connected with anedge (if not already connected), and this edgehas an `age' value set to 0. The rest of the edgesemanating from the winning node have their ageincreased, and when they get old enough theyare deleted; further, any disconnected nodes aredeleted. This has the e�ect of constructing clus-ters.

To summarise, the system handles attention asfollows. Nodes in the network respond and habituateto their respective stimuli. When fully habituated,nodes ignore further stimulation and hence do notget updated. The orienting response is reinstated ei-ther due to novelty detection, when a new node iscreated, or due to forgetting, when a node is disha-bituated | in both situations the stimulus is (re-)attended to.

Notice that there are a few more parameters in-volved in the algorithm (novelty threshold, updaterate, full-habituation time, maximum edge-age) in

Page 6: Division of Informatics, University of Edinburgh · 2002-02-21 · T H E U N I V E R S I T Y O F E D I N B U R G H Division of Informatics, University of Edinburgh Institute of Perception,

Figure 4: The simulated (left) and physical (right) en-vironments.

addition to those discussed already. Altogether thealgorithm is quite heuristic and involves many pa-rameters, some of which arise from the choice to usethe SOFM as a learning tool, and others added toexplicitly model the characteristics of an attentionsystem.

Some of the parameters are more sensitive (im-portant) and hence interesting, such as the noveltythreshold, and the length of full-habituation time,and we regard these as free parameters that have asigni�cant e�ect on the performance of the algorithm(the latter is tested explicitly in Section 3). We havetried to �x the less sensitive and hence less interest-ing parameters to values that would work across dif-ferent implementations, and have done so by exper-imenting with small toy problems. We have imple-mented the algorithm on three di�erent implemen-tations, two of which, reported in the next section,are on mobile robots, the third on a simulated hu-manoid robot (Maistros et al., 2001). For each im-plementation the free parameters are re-tuned to �tthe time-scale and the characteristics of the data forthe particular application.

Experimental setup

Throughout this paper we will be looking at experi-ments involving one task | wall-following, in both asimulated and physical environment. The simulatedexperiments are performed using a Khepera mobilerobot simulator with a learner agent following behinda teacher agent, both using infra-red sensors (see leftof Figure 4). The input here comes from 6 sensorsaround the front of the learner. The physical exper-iments are performed using a Real World InterfaceB21 robot, and a human demonstrator; the robot candetect and follow the human using its on-board videocamera (see right of Figure 4). The input here comesfrom 20 sonar sensors around the top of the robot.The size of the physical arena is approximately a4.8m � 6.5m square.

In both experiments, the learner uses its built-in following behaviour to keep behind the demon-strator; the demonstrator executes the wall-followingtask which involves moving parallel to a wall on ei-

ther side for a �xed length of time, after which an`interrupt' makes it turn towards the middle of thearena and adopt a wandering behaviour until a wallis found again. Since the learner can sometimes losethe demonstrator (in both experiments), it only in-spects its perceptual input when the demonstrator isin sight, that is, when it is in a social context. Oth-erwise, the attention system would encounter situa-tions not relevant to the task (see Marom and Hayes(2001) for more details). This is a form of stimulusenhancement as described in Section 1.

The algorithm in use

We want to see how well our attention system han-dles the perceptual data. First we'll show what afull perceptual dataset from a complete run lookslike. Since in both types of experiments the dimen-sionality of the sensor space is quite high (6 and 20),we have used a dimensionality reduction techniquecalled Principal Component Analysis (PCA), for dis-play and analysis purposes. PCA �nds the moststatistically signi�cant dimensions, called PrincipalComponents, in a multivariate dataset (see A�� andClark (1996) for more information on PCA).

As the learner is led through the environment, wesave its sensory input at each step into a dataset; theplots in Figure 5 are projections of the �nal datasetonto the �rst 2 principal components found by PCA.1

In both experiments we expect to see 3 main clusters,and they are indeed apparent in the plots: one clus-ter is the intersection of the 2 apparent lines, which isthe area of low (weak) wall-detection, i.e. `no wall',and as we move away from this intersection in eitherdirection, we reach clusters corresponding to `rightwall' and `left wall' (or vice versa), at di�erent dis-tances from the wall. Note that in the simulation, atvery close distances from the wall there are few datapoints (faint clusters). This is because the robot isusually slightly away from the wall, and this is cap-tured by more data points (darker clusters).

Next we want to see how the attention systemhandles the perceptual data, noting that it receivesthem sequentially on-line, not as a complete dataset.Figure 6 shows the construction of the SOFM net-work in response to incoming stimuli, at 4 arbitrarystages. Note that here too we have had to use PCA toreduce the dimensionality of the SOFM for displaypurposes; at each stage there is a di�erent SOFM,for which PCA �nds di�erent principal components,and so the axes di�er (in the physical experiments)

1It's important to check how much of the variance is ac-tually accounted for by the �rst 2 principal components, andthis is given by the sum of their eigenvalues. This gives us anindication of how representative a 2-dimensional plot is of thetrue complexity in the data. In the simulation 2 dimensionsaccount for approximately 95% of the total variance, com-pared to approximately 65% in the physical experiment. Thisis to be expected as there are many more physical sensors, andthey have much more noise.

Page 7: Division of Informatics, University of Edinburgh · 2002-02-21 · T H E U N I V E R S I T Y O F E D I N B U R G H Division of Informatics, University of Edinburgh Institute of Perception,

principal component 1

prin

cipa

l com

pone

nt 2

−0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6−0.6

−0.5

−0.4

−0.3

−0.2

−0.1

0

0.1

−2.5−2−1.5−1−0.500.511.5

−4.5

−4

−3.5

−3

−2.5

−2

−1.5

principal component 1

prin

cipa

l com

pone

nt 2

Figure 5: The full perceptual datasets from the simulated (left) and physical (right) experiments, projected for visualpurposes onto 2 dimensions determined by PCA.

−0.5 0 0.5

−0.4

−0.2

0

principal component 1

prin

cipa

l com

pone

nt 2

(1)

−0.5 0 0.5

−0.4

−0.2

0

principal component 1

prin

cipa

l com

pone

nt 2

(2)

−0.5 0 0.5

−0.4

−0.2

0

principal component 1

prin

cipa

l com

pone

nt 2

(3)

−0.5 0 0.5

−0.4

−0.2

0

principal component 1

prin

cipa

l com

pone

nt 2

(4)

−1 0 1 2−1

−0.5

0

0.5

1

principal component 1

prin

cipa

l com

pone

nt 2

(1)

−1.5 −1 −0.5 0−0.2

0

0.2

0.4

0.6

0.8

principal component 1

prin

cipa

l com

pone

nt 2

(2)

−2 −1 0 1 2−1.4

−1.2

−1

−0.8

−0.6

−0.4

−0.2

0

principal component 1

prin

cipa

l com

pone

nt 2

(3)

−2 −1 0 1 2−2

−1.5

−1

−0.5

0

principal component 1

prin

cipa

l com

pone

nt 2

(4)

Figure 6: The construction of the SOFM network in the simulated (left) and physical (right) experiments, displayedat 4 arbitrary stages. These are also projection onto 2 dimensions.

| what is important is the structure of the SOFMrather than the axes that PCA decides on.

The SOFM starts with 2 random nodes, and cre-ates new nodes and clusters as it receives the data.In both experiments the SOFM seems to capture thestructure inherent in the data, suggesting that ourattention system handles the perceptual data as de-sired.

3 Attention-Triggered Learning

The attention system described above is only re-sponsible for analysing and structuring perceptualinformation, a problem motivated and explained in(Marom and Hayes, 2001). No learning has yet takenplace in terms of the acquisition of new skills. Weidentify two possible ways to utilise the attention sys-tem, for learning perception-action mappings:

1. since the SOFM algorithm produces distinctrepresentations of the perceptual space, one candirectly associate (link) them with correspond-ing motor representations (grouped similarly orotherwise). We have implemented this approachand report it elsewhere (Maistros et al., 2001).

2. use the attention system purely as a trigger, andperform the actual learning on the raw sensori-

motor information; while the system is payingattention, pass the information to a separatelearning system.

In this paper we will investigate the usefulness of theattention system in the second scenario mentionedabove. We can draw inspiration for choosing this ap-proach from dual-process theories of memory, whichclaim that the process of familiarity detection is dis-tinct from actual storage and recall (O'Reilly et al.,1998). We present here results mainly from learn-ing in the simulated experiment, and brie y mentionresults from the physical experiment, which we arecurrently analysing.

The learning of perception-action mappings isachieved using a feed-forward neural network withback-propagation (backprop). The input layer con-sists of units representing the perceptual sensors (6units), the hidden layer consists of 2 units, andthe output layer consists of 1 unit, which is a pre-processed representation of the motor outputs, cor-responding to a tendency to turn2. We are going to

2computed using a moving average of motor commands asfollows: the motor commands are 0, �1, and 1 correspondingto a forward move, right turn, and left turn, respectively; thesevalues are saved into a short moving window on which theaverage is calculated; the value used in the output neuron is

Page 8: Division of Informatics, University of Edinburgh · 2002-02-21 · T H E U N I V E R S I T Y O F E D I N B U R G H Division of Informatics, University of Edinburgh Institute of Perception,

0 500 1000 1500 20000

0.1

0.2

0.3

0.4

0.5

pattern number

activ

atio

n

event 1event 0 event 2 event 0

sensor 0 sensor 5 sensor 0

sensor 1 sensor 4 sensor 1

0 500 1000 1500 20000

0.1

0.2

0.3

0.4

0.5

pattern number

activ

atio

n

Figure 7: The patterns passed on to the backprop network form learning events. Left: activation of perceptual sensors.Right: activation of pre-processed motor values, representing tendency to turn. Emergent perceptual events are markedby the vertical lines.

train this network to distinguish between moving for-ward (low turn-tendency) when the robot is parallelto a wall, and moving around randomly with turn-ing (high turn-tendency), when not next to a wall.We are not teaching the robot how to turn towardsor away from a wall, and rely on low-level coded be-haviours for these purposes.

Whenever the learner is attentive, its raw sensorvalues and (processed) motor values are used to makeup a supervised learning-pattern for the backpropnetwork. The network then performs the usual back-propagation of the error in the output unit, resultingin weight updates.

Perceptual events

We want to see how well the learning system copeswith and without attention. We recall that the atten-tion system is attentive whenever a node responds tothe input, which occurs when nodes are not fully ha-bituated. We can examine the learning patterns thatare passed on to the backprop network, as a resultof the attention triggering. For the most simpli�edcase, we do not allow habituated nodes to dishabitu-ate (forget), so the learning network is only exposedto stimulus `events' once, where by `event' we mean asequence of similar stimuli, grouped by one or morenodes (a cluster) of the SOFM.

This situation is depicted in the left plot of Fig-ure 7. It shows the perceptual activations of thelearner's 6 sensors as it is led through the environ-ment (0-1 on the left of the robot, 4-5 on the right,and 2-3 in front; see Figure 8), although only the 4side sensors are active . What is clearly visible is theemergence of the events, and we have superimposedvertical lines and labels to mark them.

Initially the learner is exposed to the wall on theleft (event 0); then it is exposed to a new event (event1), which corresponds to not sensing the wall any-

the absolute value of this average, since we are not interestedin direction of turn, only in `tendency to turn'.

0

1

5

4

2 3

motors

Figure 8: A diagram of the simulated Khepera robot.

where (recall that the demonstrator turns away fromthe wall at regular intervals and wanders in the mid-dle of the arena); another new event is then encoun-tered (event 2) | the wall being sensed on the right;and �nally the �rst event is sensed again, becausethe SOFM nodes for this event did not fully habitu-ate the �rst time; no further attention is given, as allthe nodes are fully habituated, and no novel stimuliare encountered.

On the right of Figure 7 we see the pre-processedvalues from the motors, used to train the backpropnetwork. We use the perceptual event-markings fromthe left plot to show how motoric events coincidewith perceptual ones. Frequent activations corre-spond to a higher tendency to turn (hence `wander-ing'), and this appears to occur when there is no wallstimulation; low frequencies correspond to low ten-dency to turn, or high tendency to move straight for-ward, which appears to occur when a wall is sensedon either side (hence `wall-following'). The two plotsin Figure 7 therefore suggest that the backprop net-work would learn the correct mappings.

Full exposure vs. high selectivity

Figure 7 clearly shows the bene�t of attention-triggered learning: the attention system has reducedthe exposure of the learning system to 3 `events', orapproximately 2400 learning patterns (out of 50000| the length of a single run). If the learning networkcan form the correct perception-action mapping, thisis of great value.

Unfortunately, this is not always the case: sin-

Page 9: Division of Informatics, University of Edinburgh · 2002-02-21 · T H E U N I V E R S I T Y O F E D I N B U R G H Division of Informatics, University of Edinburgh Institute of Perception,

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5

x 104

0

0.2

0.4

0.6

0.8

inpu

t

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5

x 104

0

0.5

1

outp

ut

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5

x 104

−6

−4

−2

0

2

wei

ghts

pattern number

0 500 1000 1500 2000 25000

0.2

0.4

0.6

0.8

inpu

t

0 500 1000 1500 2000 25000

0.2

0.4

0.6

0.8

outp

ut

0 500 1000 1500 2000 2500−4

−2

0

2

wei

ghts

pattern number

Figure 9: Convergence of the backprop weights of the fully exposed (left) and highly selective (right) networks atthe learning phase, as input-output patterns are passed through from the attention system. Perceptual events aredistinguished by di�erent line types.

gle presentation of events is not always suÆcient forweight convergence. Also, the learner can be `un-lucky' if at the particular times that the attentionsystem triggers learning, the demonstrator is doingsomething distracting from the task (such as turningaway from the wall because of the interrupt, or eventurning away because the learner is in the way!)

However, if we use the forgetting mechanism asdescribed in Section 2, the learning system can bere-exposed to events. The actual number of timesthe learning system is exposed to the same event isgoverned by how long we let a node in the SOFMstay fully habituated before we dishabituate it. If wemake this length 0, this is equivalent to no attention-triggering at all, since the attention system is al-ways attentive. In contrast, if we make the full-habituation length very high, the network is exposedto very little information, as we have seen in Figure 7.

We compare the performance of these extremes inboth the learning and recall phases. Figure 9 showshow patterns, passed through from the attention sys-tem, a�ect the convergence of the network weights.Perceptual (input) and motoric (output) events areshown as in Figure 7 (perceptual events are distin-guished by two di�erent line types), together withthe weights: 12 input-to-hidden weights (6� 2), and2 hidden to output weights (2�1). The left plot is fora full-habituation length of 0, and the right plot for40000 (henceforth referred to as the `fully exposed'and `highly selective' networks, respectively).

Although the highly selective network receivesvery few patterns, these patterns form a range ofexperiences as representative as the fully exposednetwork, where in the latter there is a lot of repe-tition. The weights of the highly selective networkare therefore able to converge, almost to the pointof convergence of the fully exposed network. In fact,less selective networks would reach that convergence

point because they would be exposed to more pat-terns. Note that the fully exposed network convergesquite early.

To see the importance of the �nal convergencepoint of the weights we look at the output re-called by the networks after learning is completed.To do this we let the robot wander around on itsown randomly in the environment (with an obstacle-avoidance competence) such that it picks up randomperceptions that trigger the di�erent output valueslearned through the weights (no attention is used).Figure 10 shows the distributions of the output re-called by the fully exposed and highly selective net-works.

The two peaks in the output recalled by the fullyexposed network show that this network learned todistinguish between the two types of perceptual ex-perience: one requiring very low turn-tendency (val-ues close to 0), corresponding to moving parallel to awall, and the other requiring a higher turn-tendency,corresponding to moving randomly when not near awall. We also see two peaks in the output recalled bythe highly selective network, which suggests that ittoo has learned to distinguish the experiences; how-ever, in absolute value the output is higher than fromthe fully exposed system, and the left peak is perhapsnot close enough to 0. This is not surprising, and isdue to the weights converging at di�erent points, asdiscussed above.

To conclude, a fully exposed network convergesquite early, suggesting that exposure to so many pat-terns is not needed. Further, such a network obvi-ously does not make any use of the attention sys-tem. On the other extreme, we see from a very se-lective network that single presentations are perhapsnot quite enough, although the network learns rea-sonably well.

Page 10: Division of Informatics, University of Edinburgh · 2002-02-21 · T H E U N I V E R S I T Y O F E D I N B U R G H Division of Informatics, University of Edinburgh Institute of Perception,

0 0.005 0.01 0.015 0.02 0.025 0.03 0.0350

0.5

1

1.5

2

2.5x 10

4

recalled output

freq

uenc

y

0.01 0.02 0.03 0.04 0.05 0.06 0.070

0.5

1

1.5

2

2.5

3x 10

4

recalled output

freq

uenc

y

Figure 10: Distributions of the output recalled by the fully exposed (left) and highly selective (right) networks afterlearning is complete; the recall is triggered by input from a wandering behaviour in order to see all the possible valueslearned by the network.

0 10000 20000 30000 40000 500000

0.5

1

1.5

2

2.5

3

3.5

4

4.5x 10

4

full−habituation time

num

ber

of p

atte

rns

0 10000 20000 30000 40000 500000

0.5

1

1.5

2

2.5

3

3.5

4x 10

4

Full−habituation time

Ene

rgy

hand−crafted behaviour

random behaviour

recalled behaviour

Figure 11: The number of patterns passed through from the attention system to the backprop network (left), and theenergies acquired at the testing phase (right), as a result of di�erent full-habituation lengths of the attention systemat the learning phase. Each point is an average of 100 runs (the error bars in the left plot are of negligible lengths andtherefore not shown); the length of a single run is 50000.

The e�ect of habituation

A good compromise must exist between these two ex-tremes. In what follows, we look at the performanceof a range of networks by modifying the length ofthe full-habituation time. In the plots of Figure 11,the independent variable is this length of time, wherethe experiment was repeated 100 times for each valueshown. First, to show the e�ect of habituation on theexposure length, we plot the number of patterns pre-sented to the network through attention-triggering,in the left of Figure 11.

As expected, when there is no attention trigger-ing, the learning is exposed almost all the time (theonly times it is not are when the learner loses thedemonstrator). Attention triggering provides a sub-stantial reduction in exposure, and the less forgettingwe allow (i.e. longer time before dishabituation), themore reduction we get.

We have seen what the recalled output looks like(Figure 10), but how does this translate to the abilityof the robot to reproduce the task of wall-following?To test this we place a threshold on the output neu-

ron to determine whether to adopt a `move-forward'(low output), or a `wander' behaviour (high output).If we consider the fully exposed system as producingthe most `correct' learning, we can decide on sucha threshold using the left of Figure 10 | a suitablethreshold would be approx. 0:015.

The learner is placed in the environment on itsown, its perceptual input is passed to the backpropnetwork (no attention is used), where an output iscomputed. Further, a built-in `obstacle-avoidance' isused to turn the robot when it is facing a wall, andprevent it from hitting obstacles.3

Equipped with this combination of built-in be-haviours and recall from the backprop network, wecalculate an energy measure which is the accumu-lation of the robot's side sensors sensing the wall.We use this as a measure of the robot's ability toperform the task (higher energies correspond to bet-

3We have only taught the robot what to do when there isa wall parallel to it, on its side; any other competencies wouldrequire a higher representational complexity, i.e. more outputunits, which we leave for future work.

Page 11: Division of Informatics, University of Edinburgh · 2002-02-21 · T H E U N I V E R S I T Y O F E D I N B U R G H Division of Informatics, University of Edinburgh Institute of Perception,

ter wall-following). The right of Figure 11 showsthe (95% con�dence intervals of the) di�erent en-ergies acquired. In addition, the energies acquiredby a hand-crafted wall-following behaviour, and arandom-wander behaviour are shown.

We see that the energy only starts to drop whennodes stay fully-habituated for longer than 10000steps, which is a �fth of the total run length. Belowthis time, the networks are re-exposed to events justenough times to ensure that the weights converge toa `desired' point, and hence recall the `desired' out-put.

To summarise, there is a substantial reduction inthe exposure of the learning system due to attention-triggering (left of Figure 11), and this does not causea signi�cant decrease in the performance providedthat forgetting is allowed reasonably frequently. Inthese cases the performance is almost as good as ahand-crafted behaviour. When forgetting is less fre-quent, the performance drops, but is still better thana random behaviour.

Learning on a real robot

We are currently implementing the learning back-prop system discussed in this section on the physi-cal robot mentioned in the previous section (see Fig-ure 4). `Attention' on a real robot is much harder,as one is dealing with real, noisy data. Furthermore,our robot uses 20 perceptual sensors, compared with6 in the Khepera mobile robot simulation, makingthe problem even harder.

Experimentation is also much harder as it is lesspractically possible to perform many experimentsand control for all environmental conditions a�ect-ing the sensors (lighting, air-moisture etc.) We willdiscuss some early results here, but mention that wedo not have enough data yet, and require more ex-perimentation.

We have identi�ed the emergence of perceptualand motoric events as in Figures 7 and 9 (it is hard toshow, graphically, the activation of 20 sensors with-out grouping them!), but have not yet been able toachieve satisfactory weight convergence. This couldbe due to not having enough data | we are currentlyrunning more, longer experiments.

However, looking at the output recalled from thenetwork, as in Figure 10, is quite encouraging. Wehave had to do this slightly di�erently: the robot'swander behaviour sees much less of the wall than inthe simulated experiment, so by feeding the back-prop network input from a wander behaviour alonewe would not see two peaks as in Figure 10. There-fore in addition to inspecting the output from a wan-der behaviour, we also inspect the output resultingfrom a hand-crafted wall-following behaviour. If thenetwork has learnt correctly we expect the distribu-tions of the two cases to be di�erent, with lower val-

0.35 0.36 0.37 0.38 0.39 0.4 0.41 0.420

500

1000

1500

2000

2500

3000

3500

4000

4500

recalled output

freq

uenc

y

wall−following (mean 0.3678)wander (mean 0.3770)

Figure 12: Distributions of the output recalled by a fullyexposed network on the physical robot, after learning iscomplete; the recall is triggered by input �rstly from awandering behaviour and secondly from a hand-craftedwall-following behaviour.

ues being recalled by the wall-following behaviour asit involves the robot being parallel to a wall most ofthe time (but not all the time).

The output recalled by a fully exposed network,learning approx. 6000 patterns out of a possible10000 (10000 steps correspond to approx. 40 min-utes. of experimentation), is shown for wander andwall-following behaviours, in Figure 12.

We see that on average the wall-following be-haviour triggers lower outputs than the random be-haviour (and these di�erences are statistically sig-ni�cant). We cannot expect the distributions to becompletely separated because each behaviour sharessome experiences of the other, but it is encouragingto see the di�erences mentioned.

4 Social Situatedness

The value of the social context has not been explic-itly stressed in the paper, but is implicitly evident.Through a `following' behaviour, the learner is ex-posed to the parts of the environment deemed impor-tant by the demonstrator. This helps the learner inthe perceptual `analysis' of the environment, in termsof the task (Marom and Hayes, 2001), and thereforein the construction of an attention network.

An interesting comparison would be between ourattention-triggered learning system, and a reinforce-ment learning system, where the robot learns on itsown. We would expect the socially-situated robot tolearn faster, as it would not take it as long to discoverthe critical (rewarding) parts of the environment. Weleave this for future work.

Finally, we are interested in implementing an-other form of social interaction, which could serve therole of an additional factor of dishabituation. Whena neutral stimulus is followed by a rewarding one, theorganism responds to the neutral stimulus, and thisis a form of classical conditioning (Balkenius, 2000).

Page 12: Division of Informatics, University of Edinburgh · 2002-02-21 · T H E U N I V E R S I T Y O F E D I N B U R G H Division of Informatics, University of Edinburgh Institute of Perception,

In the physical experiment reported above wehave also used an additional stimulus: the humandemonstrator waves a red glove to signal to the robotat critical parts of the task. The rewarding stimulusin this case is rather arti�cially implanted to draw at-tention, so strictly speaking this is not classical con-ditioning in the sense that the rewarding stimulusdoes not naturally occur in the environment withoutthe presence of the demonstrator. Nevertheless, fora robot learning noisy data on-line, this could pro-vide another useful enhancement | we are currentlyanalysing the results.

5 Conclusion

We have presented an attention system, motivatedfrom psychology and neurophysiology. With this sys-tem, a robot, situated in a physical environment, isable to build up perceptual representations of its ex-periences, and habituate to stimuli present in theenvironment. Additionally this system is capable ofreinstating a reaction to the environment if this isneeded, either due to novelty or to the passage oftime (forgetting). In future work the system willalso be supplemented with a mechanism for utilisingstimulus rewards.

Social situatedness is a source of implicit infor-mation transfer. The input from an external source,in the form of a demonstrator, is of signi�cant valuefor this system, as it provides it with relevant andwell-structured experiences in terms of a particulartask. We have discussed experiments from both asimulated environment, and a physical one involvinga human demonstrator.

Lastly, we have also seen how the attention sys-tem is useful as a triggering mechanism for learningnew reactive skills, and since the system determineswhen learning of perception-action mappings is trig-gered, the level of exposure provided by the attentionsystem is important.

References

A��, A. A. and Clark, V. A. (1996). Computer-AidedMultivariate Analysis. Chapman & Hall, London.ISBN 041273060x.

Balkenius, C. (2000). Attention, habituation andconditioning: Toward a computational model.Cognitive Science Quarterly, 1(2).

Cowan, N. (1988). Evolving conceptions of memorystorage, selective attention, and their mutual con-straints within the human information processingsystem. Psychological Bulletin, 104:163{191.

Galef, Jr., B. G. (1988). Imitation in animals: his-tory, de�nition, and interpretation of data fromthe psychological laboratory. In Social Learning:Psychological and Biological Perspectives, pages 3{28. Lawrence Erlbaum Associates, Hillsdale, N.J.

Itti, L. and Koch, C. (2001). Computational mod-eling of visual attention. Nature Reviews Neuro-science, 2(3):194{203.

Kahneman, D. (1973). Attention and E�ort. PrenticeHall, Englewood-Cli�s, New Jersey.

Maistros, G., Marom, Y., and Hayes, G. M. (2001).Perception-action coupling via imitation and at-tention. To appear in AAAI Fall Symposium onAnchoring Symbols to Sensor Data in Single andMultiple Robot Systems, North Falmouth, Novem-ber 2001.

Marom, Y. and Hayes, G. M. (2001). Interact-ing with a robot to enhance its perceptual atten-tion. In Nehmzow, U. and Melhuish, C., editors,Proceedings of Towards Intelligent Mobile Robots(TIMR) 2001, Department of Computer Science,Manchester University.

Marsland, S., Nehmzow, U., and Shapiro, J. (2001).Novelty detection in large environments. In Nehm-zow, U. and Melhuish, C., editors, Proceedings ofTowards Intelligent Mobile Robots (TIMR) 2001,Department of Computer Science, ManchesterUniversity.

Nehmzow, U. (1999). Mobile Robotics: A PracticalIntroduction. Springer Verlag. ISBN 1-85233-173-9.

O'Reilly, R., Norman, K., and McClelland, J. (1998).A hippocampal model of recognition memory. InAdvances in Neural Information Processing Sys-tems, volume 10, pages 73{79. MIT Press.

Posner, M., Snyder, C., and Davidson, B. (1980).Attention and the detection of signals. Journal ofExperimental Psychology General, 109:160{174.

Sokolov, E. (1963). A neuronal model of the stimulusand the orienting re ex. In Sokolov, E., editor,Perception and the Conditioned Re ex, pages 282{294. Pergamon Press.

Spence, K. W. (1937). Experimental studies of learn-ing and higher mental processes in infra-humanprimates. Psychological Bulletin, 34:806{850.

Whiten, A. (2000). Primate culture and social learn-ing. Cognitive Science special issue on PrimateCognition, 24. In press.