From behaviour-based robots to motivation-based robots LEARINIG/from behavio… · The moti-vation is ‘to have the mother’s image inside the visual ... tivation Maker module sets

Robotics and Autonomous Systems 51 (2005) 175–190

From behaviour-based robots to motivation-based robots

Riccardo Manzotti∗, Vincenzo Tagliasco

Laboratory for Integrated Advanced Robotics (LIRA-Lab), Department of Communication, Computer and System Sciences,University of Genoa, Viale Causa 13, Genoa I-16145, Italy

Received 28 December 2002; received in revised form 12 October 2004; accepted 20 October 2004Available online 24 December 2004

Abstract

The appearance on the market of entertainment robots for children and families has ipso facto created the new categoryof motivation-based robots. A taxonomy of the architectures of different robot categories is proposed. The architecture ofmotivation-based robots is phylogenetic and ontogenetic. A tentative architecture for a specific experimental setup is described.The results of the experiment show that a new motivation arises from the interaction between the robot and the environment.Motivation-based robots equipped with ontogenetic architecture might provide the foundation for a new generation of robotscapable of ontogenetic development.© 2004 Elsevier B.V. All rights reserved.

Keywords:Associative learning; Development; Entertainment robots; Behaviour-based robots; Motivation-based robots

1

osbatmv

f

v

eriestod-ver,ness,menttheirre-

s ofim atbe-our-

rob-the

0

. Introduction

Manufacturers design entertainment robots capablef interacting with humans. This interaction occurs ateveral levels: from the selection of a set of in-builtehaviours to the capability of being independent andcting on its own. Entertainment robots learn actions

hrough touch sensors, switches and voice recognitionodules. The most sophisticated robots are said to de-

elop unique personalities through the interaction with

∗ Corresponding author. Tel.: +39 010 353 2817/3478257675;ax: +39 010 353 2948.E-mail addresses:[email protected] (R. Manzotti),

[email protected] (V. Tagliasco).

a specific environment. Robots that go through a sof development phases (real or simulated fromdler, to child, to adult) appeal to consumers. Moreorobots must show emotions like happiness, sadanger and surprise, in different degrees. Entertainrobots must be curious and must be able to exploresurroundings on their own: these robots develop inlation to their personal history. We define this clasrobots as motivation-based robots because they are-creating the motivational structure of biologicalings. The time has now come to move from behavibased robots[1] to motivation-based robots[2–4].

Recently, in neuroscience and robotics, the plem of what motivation is has been investigated inmore general framework of what a subject is[5,6]. We

921-8890/$ – see front matter © 2004 Elsevier B.V. All rights reserved.doi:10.1016/j.robot.2004.10.004

176 R. Manzotti, V. Tagliasco / Robotics and Autonomous Systems 51 (2005) 175–190

propose an engineering approach to create motivationsin robots that does not require such a broad frame-work. Behaviour-based robots make use of fixed “mo-tivations” hardwired in their structure at design time[7–9]. Even systems that are capable of learning newbehaviours must pursue a target of some kind pro-grammed at design time; for instance, if a robot hasto learn to reach a given target with its arm, it will learnto move according to a predefined “motivation”. Onthe contrary, the motivation-based robots must be ableto perform actions driven by motivations which theydid not possess at design time, but which they havedeveloped by interacting with the environment.

For instance, an “intelligent” electronic device likea last-generation digital photo camera performs a longlist of “intelligent” tasks: it selects the best programdepending on the light, it applies a complex procedurefor each program in order to select the right exposure,the right focus and a long list of related parameters. Itmodifies its behaviour on the basis of the environmen-tal conditions in order to optimise the end result. Yet,notwithstanding what has been stored in its internalmemory during a long journey, its behaviour does notchange. No external event can modify its internal proce-dures as they were originally designed. Alternatively,if a 3-year-old child came on the same long journey,s/he would change. The events that happened to/aroundhim/her would change not just his/her memory but alsohis/her future development, his/her internal criteria andthe way in which future events will modify him/her. Ona andt orki rams.T eechp heys a-r for-m era:w wet pro-g dif-f theo rentp assica eirg peri-e and,t me.

Experiences modify both his/her behaviour(s) andhis/her criteria.

From the previous example, it is clear that thereis a difference between motivation-based beings andbehaviour-based beings. In the next paragraph this dif-ference will be described in detail and a candidate ar-chitecture for artificial beings will be proposed.

A mallard duckling before its imprinting process hasno idea of the visual appearance of its mother; however,since the bird sees its mother under favourable condi-tions, it develops a strong motivation to see the motherduck again. Before the imprinting there was no interestwhatsoever for that kind of visual object, but immedi-ately afterwards, the mallard duckling tries to keep theimage of its mother inside its visual field. The moti-vation is ‘to have the mother’s image inside the visualfield’. All its following actions are performed in orderto make this event occur as frequently as possible. Ifthat particular mother-bird had not shown itself to themallard duckling, the newborn bird would not have de-veloped any interest in it. If a different image had beenshown instead of the real mother, let us say the face ofKonrad Lorenz, the newborn bird would have tried tomaximize the event ‘to have the face of Konrad Lorenzinside the visual field’. More complex behavioural pat-terns are based on the same concept of repetition of anoccurred event (motivation). Peter had a nice eveningwith Susan so he invites her again in order to repeatthe pleasant experience. Mary had a pleasant time inVenice and so she plans a new holiday there.

2t

refi ivald ndd newm theirg h aren n. Am fol-l nd,t ands pe-f illc

n intermediate level between the 3-year-old childhe camera, there are classic artificial neural netwmplementations, such as speech recogniser proghey are clever devices; they recognize normal spronounced by an average male or female voice. Ttore individuals’ voices and modify their internal pameters in order to learn how to improve their perance. In this respect they are better than the camhat happens to them modifies their behaviour. If

ake two different instances of speech recogniserrams used by two different individuals, they are

erent: each is specialized on its owner’s voice. Onther hand, if we take two cameras used by two diffehotographers, they are exactly the same. Even clrtificial neural networks are lacking something: thoals remain the same. Independently of their exnces they do not change their goals. On the other h

he 3-year-old child develops new goals at any ti

. Architectures for building robots: aaxonomy

Not all the motivations of biological systems axed at birth: they only possess a very limited, survriven, built-in set of motivations. As they grow aevelop, biological systems continuously generateotivations on the basis of two separate factors:enetic background and their past experience. Botecessary in order to select a particular motivatioallard duckling does not have the motivation to

ow its genetic mother. Yet, via its genetic backgrouhe bird possesses the capability of choosing a birdelecting it as a motivation. That particular bird (hoully its mother) will become the motivation that wontrol the learning of the bird.

R. Manzotti, V. Tagliasco / Robotics and Autonomous Systems 51 (2005) 175–190 177

The behaviour of behaviour-based artificial struc-tures depends on experience and motivations (goals)defined elsewhere at design time[1,2]. In complex bio-logical systems, behaviour still depends on experienceand motivations; yet, motivations are not fixed. Mo-tivations are the result of the interaction between ex-perience and a limited number of hardwired instincts(the ones provided by genes). In many complex bio-logical systems, it is possible to distinguish betweenphylogenetic aspects and ontogenetic ones, nature ver-sus nurture[10–12]. In general, phylogeny refers tothose processes that produce new structures (genes,bodily features, behaviours, instincts) in a time scalelarger than that of single individuals. On the contrary,ontogeny is limited to the life span of single individu-als [10]. Furthermore, ontogeny can be driven by thephylogenetic repository (genes or instincts) or by theunpredictable contingencies of the environment. Herewe endorse the view that is necessary to distinguishbetween goals which are determined before the actualdevelopment of an agent or subject, and those goalswhich are specified after the birth of the agent. We willcall the former instincts and the latter motivations. Theobjective of this paper is to illustrate a simple set ofprocedures which produce motivations during devel-opment, as in the case of the imprinting procedure ofbirds.

Is it possible to implement instincts and motivationsin an artificial system? We propose a taxonomy of ar-chitectures: a fixed control architecture, a learning ar-ct ofm pleD nala pri-o ares sec-o ngi emi nM TheR on-t s ofa urea arti-fisw -

Fig. 1. Three possible architectures. In the first case (top) bothwhatandhowthe system does is defined a priori; in the second case (mid-dle) the system modifieshow it behaves but notwhat it is doing; inthe last case (bottom) the system modifies bothwhatandhowit does.

tivation Maker module sets the goals that have to bepursued by the Rule Maker module.

2.1. Fixed control architecture

In this case, the causal structure of the system isfixed (seeFig. 1a). There is no ontogenesis whatsoever.Notwithstanding the behavioural complexity of the sys-tem, everything happens because it has been previouslycoded within the system structure. A mechanical deviceand a complex software agent are not different in thisrespect: both are pre-programmed in what they mustachieve and how they must achieve it. Nothing in theirstructure is caused by their experiences. Suitable ex-amples of this category are Tolam’s artificial sow bug[13], Braitenberg’s thinking vehicles[14], Brooks’ ar-

hitecture and an ontogenetic architecture (Fig. 1). Inhe first case (Fig. 1a), the system has no capabilityodifying how it does what it does. There is a simecision Maker module, which take the input signd produces the output on the basis of some ari hard-wired module. Examples of this structureimple control devices or machine automata. In thend case (Fig. 1b), the system is capable of modifyi

ts behaviour to fulfil some a priori target. The systs capable of modifyinghow it behaves. The Decisio

aker module is flanked by a Rule Maker module.ule Maker module can modify the a priori rules c

ained in the Decision Maker module on the basipriori hard-wired criteria. Examples of this structre reinforcement learning or supervised learningcial neural networks. In the third case (Fig. 1c), theystem is capable of modifying not onlyhow it doeshat it does, but also to definewhat it does. The Mo


tificial insects[15,16]and recent entertainment robotslike Sony’s AIBO and Honda’s humanoid ASIMO.

2.2. Learning architecture

A different level of structural dependency with theenvironment is provided by the architectures that canlearnhow to perform a task (seeFig. 1b). Behaviour-based robots can be classified in this category. Systemsbased on artificial neural networks are well-known ex-amples of this kind of architecture. These systems de-termine how to get a given result once they have beenprovided with a specific motivation. The motivationcan be given either as a series of examples of correctbehaviour (supervised learning) or as a simple evalu-ation of the global performance of the system (rein-forcement learning)[17,18]. In both cases some kindof learning is applied. These systems lack the capa-bility of creating new motivations. By controlling itsmotors a behaviour-based robot can learn how to nav-igate avoiding static and dynamic obstacles. Howeverthe motivation behind this task is defined by thea pri-ori design of the system. There are several examplesof this kind of learning agent: Babybot at LIRA-Lab[19,20], Cog at MIT[7,21].

2.3. Ontogenetic architecture

A system that learns bothhow to perform a giventask andwhat task must be performed, corresponds toac ri-m pableo tot ys-t ss thisp larsF re ont s. As pendo esw ate-g oesw ge-n irda h forw

3. A motivation-based architecture

The proposed architecture is ontogenetic accordingto the previously defined taxonomy. The underlyingidea is to have a physical structure (that implementsthe proposed architecture), which is activated by in-coming events and develops motivations on this basis.The proposed architecture makes use of elementary as-sociative processes, simple Hebbian learning and case-based reasoning.

The architecture receives an incoming stimulus andproduces a signal (Relevant Signal) which depends onthe value the system gives to the incoming stimulus.For instance, if the incoming stimulus corresponds tothe mother’s face, the system will produce a strongRelevant Signal. If the incoming stimulus correspondsto a dull grey object, the Relevant Signal will be weaker.

The architecture is made of three main modules:the Category Module that is basically a pattern classi-fier; the Phylogenetic Module that contains thea prioricriteria; the Ontogenetic Module that applies Hebbianlearning and develops new criteria by using the patternsstored in the Category Module. The incoming stimuliare stored in the Category Module on the basis of theRelevant Signal coming from the Phylogenetic Mod-ule and the Ontogenetic Module. At the beginning, theRelevant Signal depends on those properties of the in-coming signals that are selected by the PhylogeneticModule. Subsequently, the Relevant Signal is flankedby the new signals coming from the Ontogenetic Mod-u

vel-o nce,a hingi to-w elopn int:i es oft ticp

3

inc fromt ig-n esisi red

n ontogenetic architecture (seeFig. 1c). This is thease for most, if not all mammals; it is true for pates and for human beings. They are systems caf developing new motivations that do not belong

heir genetic background. In the field of artificial sems there has been a series of attempts to addreroblem[22–25]as well as attempts to locate simitructures in the cortical architecture of humans[26].or their development, these systems depend mo

he environment than the previous two categorieystem belonging to the first category does not den the environment for what it does or for how it dohat it does. A system belonging to the second cory does depend on the environment for how it dhat it does, but not for what it does, which is phyloetically determined. A system belonging to the thnd last category depends on the environment bothat and for how it does what it does.

le.The architecture is aimed at mimicking the de

pment of motivations in human beings. For instahuman develops an interest for cars even if not

n his/her phylogenetic code is explicitly directedards cars. On the contrary, an insect cannot devew motivations but must follow its genetic bluepr

t has no ontogenetic development. One of the issuhis architecture is to explicitly divide the ontogeneart from the phylogenetic part.

.1. Category Module

The category module has the role of groupinglusters, classes and categories of stimuli cominghe external events. A discrete flow of incoming sals is the input of the category module. No hypoth

s required for their timing; no hypothesis is requi


Fig. 2. A scheme for a motivation-based architecture.

for their nature. These signals could be of any kind(chunks of auditory signals, visual images, filtered vi-sual images). Each signal is represented by a vector�s ofreal numbers (�s∈Rn). CM creates a series of clustersCi grouping classes of stimuli where each clusterCi isa set of stored stimuli.

The process of cluster definition is based on an inter-nally built-in criteria for clustering and on the presenceof a Relevant Signal (seeFig. 2).

Whenever an incoming signal is received, a Cat-egories Vector�c, which is the output of the CM, iscomputed. The Categories Vector contains as many el-ements as the clusters inside the CM at the time inwhich the incoming signal is analysed; the elements of�c provide an indication of which cluster best representsthe current stimulus. Theith elementci is equal to thenormalized difference between the maximum possibledistance, usually 1, and the actual distancedC (whichwill be explained below in this paragraph) between theincoming signal�s and the clusterCi . In this way, theelementci with the greatest value corresponds to theclusterCi that best matches the incoming signal:

�c =

1 − dC(�s, C1)

1 − dC(�s, C2)...

1 − dC(�s, Cn)

.

There is no unique way to determine the distance func-tionsdC (dC : (Rn ×D) �→ R, D cluster domain) be-tween a vector and a cluster. The process ofci updatingrequires the definition of two thresholds: one to definethe minimum distance from cluster (mcd) and anotherto define the maximum distance from a cluster (Mcd).

The CM tunes its activity on the basis of the Rel-evant Signal. As shown inFig. 3, the Relevant Signal(R(t)) is the sum of two different signals: the RelevantOntogenetic Signal (Ron(t)) and the Relevant Phyloge-netic Signal (Rph(t)), according to

R(t) = max(Ron(t), Rph(t)).

If andonly ifthe Relevant Signal is active, every timea signal is received, the CM performs the followingactions:

(i) If the stimulus is too similar to the already storedstimuli, do nothing (dC(�s, Ci) < mcd).

(ii) If the stimulus is sufficiently similar to one of thepreviously created clusters (mcd ≤ dC(�s, Ci) ≤Mcd), the stimulus is added to that cluster.

(iii) If the stimulus is not sufficiently similar to any ofthe stimuli already stored, a new cluster is created(dC(�s, Ci) > Mcd).

By storing a stimulus only if the Relevant Signal isactive, the system does not assign new resources forevery incoming signal (the first rule is useful to avoidto store equivalent stimuli).


Fig. 3. Timing of operations.

3.2. Phylogenetic Module

The Relevant Phylogenetic Signal,Rph(t) is pro-duced by the Phylogenetic Module (PM,Fig. 2). Thismodule is the only one that has some built-in criteriaconcerning the relevant properties of the incoming sig-nal (for instance, the structure of the Category Moduledoes not present any similar feature). Functionally, ithas the same role as the genetic instincts in biologicalsystems. It is similar to saliency systems or attentionmechanisms[27]: it selects which stimuli are worth theattention of the system. A Phylogenetic Module worksin two different ways: (i) it autonomously produces asignal on the basis of some internal criteria; (ii) it pro-duces a signal on the basis of some external events. Inthe second case the PM needs some kind of elementarycapability in order to recognize particular occurrencesof events in the external environment (the presence ofthe mother, the presence of soft or brightly colouredobjects).

For instance, a baby looks with more curiosity atbrightly coloured objects than at dull colourless ob-jects, independently of any past experience. This be-haviour requires the existence of a hardwired functionlooking for a relevant property of images (saturatedcolours). This module provides criteria that can be used

to select correct actions (for instance those actions thatmaximize the presence of the interesting stimuli).

The performance of the Phylogenetic Module is im-plemented by the functionfphylogenetic: Rn �→ [0,1]applied to the input�s′(t) that is a signal from which itis possible to know if something relevant is happening.The signal�s′(t) comes from the external environment.For instance it could be a verbal approval for a specificevent; or it could be a reward/punishment following abehaviour. The resulting output is:

Rph(t) = fphylogenetic(�s′(t)).

A system could contain one phylogenetic functionfor each kind of event the designers want the systemto react to. For instance, there could be a function todetect the presence of round-shaped objects (a proto-type for faces), a function to detect the presence ofobjects with highly saturated colours and a function todetect the presence of moving objects. At every instantthere could more than one function to signal that some-thing interesting is going on: more than onefphylogeneticfunction can be evaluated. The output of the Phyloge-netic Module is the maximum among the outputs of thedifferent fphylogeneticfunctions whose input is always


�s′(t):Rph(t)=fphylogenetic(�s′(t))= max

i=1,...,m(f iphylogenetic(�s′(t))),

wherem is the number of kinds of events which thesystem is capable of reacting to from the beginning. Som is the number of elementary instincts (each corre-sponding to a separate phylogenetic function) that thesystem possesses. It is important to outline that (i) thePhylogenetic Module is incapable of adaptability andthat (ii) the Phylogenetic Functions might be very sim-ple because their role is to orient the attention of theCM towards certain classes of objects, albeit makingmistakes.

In a multisensory system, each sensory modality canbe used as an alternative source of information for an-other sensory modality. In real biological systems, thereare plenty of sources of information (like pain, skin re-ceptors, tactile information) that can be the input�s′(t)of the PM. The same sensory modality can be the inputboth for the PM and for the CM. If this happens, it ispossible to assume that

�s′(t) = �s(t).If the system were composed of just the PM and

the CM, the system would be a reinforcement learningsystem.

3.3. Ontogenetic Module

cri-t f thei newc hast ia inb

thei -d ctor�g le-m eO m-pt

R

thel he

output of the CM to propagate further. If thegi are pos-itive, the correspondingci contribute to the RelevantSignal. Since theci represent the stored categories ac-quired during the experiences of the system, theRon isthe result of the ontogenetic development.

The result of the architecture is to produce a new re-inforcement signalRon(t), which depends only on theactual experiences of the system (i.e. on the receivedinput signals). HereRon(t) is called the Relevant Onto-genetic Signal because it derives from the actual expe-riences of the system. It is the result of the developmentof an individual system and its history; hence it pertainsto its ontogeny.

The vector�g is the result of a Hebbian learning im-plementation with respect to the simultaneous occur-rence of signals (h(t), �c(t)); learning happens whenh(t)andci(t) fire simultaneously. The value ofgi approachesthe value 1 if the signalh(t) and the componentci(t) arecorrelated in time. A possible function is the following:

gi = 2

πarctan

(∫ t

t0

(h(τ) · ci(τ))q dτ

),

whereq∈ [0,1] can be used to tune the speed of learn-ing. The elementci(t) corresponds to theith elements ofthe output of the CM, andh(t) is the signal that controlsthe performance of the Ontogenetic Module.

Four different choices are possible forh(t): (i) h(t)is set to equal a positive constant; (ii)h(t) is an a priorit uto ei thee

p oryhf s thei eticS

oas of thei ri dt ac-c of thei

Whereas the Phylogenetic Module has built-ineria about the nature and the relevant properties oncoming signal, the Ontogenetic Module selectsriteria on the basis of experience. Functionally ithe same role as the acquired ontogenetic criteriological systems.

The Ontogenetic Module acts as a gate forncoming output of the CM�c(t). The gating proceure is implemented by means of an internal ve= (g1, . . . , gn)t which has the same number of eents as the clusters in CM.�g is contained inside thntogenetic Module. The output of the OM is couted as the maximum among the elementsgi times

he elementsci of the CM:

on(t) = maxi=1,...,n

(gi · ci). (1)

Thegi have the role of gates (hence the use ofetterg) in order to let or to prevent the effect of t

ime variant function; (iii)h(t) is set to equal the outpf the PM (h(t) = Rph(t)); (iv) h(t) is connected to som

ndependent sources of signals that are linked tonvironment.

In the first case, sinceh(t) is a constant, eachgi isroportional to how much the corresponding categas been represented in the input stimulus�s. The more

requent and the more intensely a category matchenput, the greater its effect on the Relevant Ontogenignal will be.In the second case,h(t) varies in time according t

n a priori time variant function. Eachgi will corre-pond to those categories that are representativenput during those periods in whichh(t) is larger. Fonstance,h(t) might be high in an initial period anhen it might vanish: the Ontogenetic Module willept only those categories that are representativenput during the initial period.


In the third case (h(t) = Rph(t)), in an early stage,eachgi will be representative of those categories thatoccur at the same time as the activations of the Phylo-genetic Functions. Eventually, there can be a drift fromthe categories selected by Phylogenetic Functions tothe new categories selected by the Ontogenetic Func-tions.

In the fourth case,h(t) is assigned to a separatesource of signals; different sensor modalities can beassociated. For instance, the incoming signal�s mightbe visual, whileh(t) might be the result of the tactilesensory modality. As a result, thegi would be higherwhen the two different sensory modalities are simulta-neously present.

The OM produces a new reinforcement signals thatare indirectly related to the phylogenetic structure ofthe system. The interaction between the OM and theCM generates a new set of functions, which are theontogenetic equivalent of the phylogenetic functions:

f iontogenetic(�s) = gi(t)(1 − dC(�s, Ci)).At each instant, the ontogenetic functions

f iontogenetic(�s) compute the relevant ontogenetic signal.Their form depends on the information stored in thegiand in theCi , which is the result of the past history ofthe system. We can rewrite Eq.(1) as follows:

Ron(t) = maxi=1,...,n

(gi · ci)

= max (f i(�s(t))) = fontogenetic(�s(t)).

3

ruc-t xpe-r sionb andt witht

-e lus( (2)a theb hent ge-n l isc the

Relevant Signal that is sent to the Category Moduleand to the output (5). Only at this stage the CategoryModule modifies its clusters on the basis of both theincoming stimuli and the Relevant Signal. If the Onto-genetic Module were not active, the architecture wouldstop its development and become a pure feed forwardnetwork.

4. Experimental results: the emergence ofmotivations

To test the architecture, an experiment was car-ried on in which a robot embodying the proposedmotivation-based architecture develops a new motiva-tion on the basis of its own experiences. In the exper-iment, an incoming class of visual stimuli (not codedinside the architecture) produces a modification in thesystem’s behaviour differently from what happens inbehaviour-based robots. In behaviour-based robots thetransition between different behaviours elicited by amotivation is defined by the designer and does notdepend on a newly produced self-motivation. By in-teracting with the environment, the system adds anew motivation that changes not onlyhow(behaviour)but alsowhat (motivation at the basis of behaviour)the system is doing. The system has, in this pre-liminary experiment, a single behaviour: directing ornot its gaze towards objects. This behaviour is notwhat is learned by the architecture; it is used by thea iva-t

oursw ppedw eryc ntlyo e thes fur-t tim-u ticm ale taryc our-l lity tod rdst timea lop-m

i=1,...,n

.4. How the architecture works

The main goal of the architecture is to create a sture that can be changed completely by its own eiences. In the architecture there is a clear-cut divietween the phylogenetic part (the a priori section)

he ontogenetic part produced by the interactionhe environment.

As it is possible to see inFig. 3, the timing of oprations is the following. First the incoming stimu1) is compared to each cluster of stored vectorsnd, as a result, the output vector is computed onasis of the current structure of the network (3). T

he Ontogenetic Signal is computed by the Ontoetic Module (4). Finally, the Ontogenetic Signaombined with the Phylogenetic Signal to produce

rchitecture to show the effects of its new motion.

A series of different shapes associated with colere presented to the robot. The system is equiith a phylogenetic motivation that is aimed at voloured objects; a colourless stimulus, independef the shape, does not elicit any response. Sincystem has an ontogenetic module it developsher motivations directed towards classes of sli different from those relevant for its phylogeneodule. After a period of interaction with the visunvironment (constituted by a series of elemenoloured shapes), the robot is motivated by coless shapes also. The system shows the capabievelop a motivation (by directing its gaze towa

he stimulus) that was not envisaged at designnd that is the result of the ontogenetic deveent.


Fig. 4. The Cartesian (upper row) and log-polar (lower row) images for a cross (a), a wave (b), and a star (c).

4.1. Robotic setup

A robotic head with four degrees of free-dom has been adopted as robotic setup. Weused the EuroHead developed for navigation (Pan:range = 45◦, velocity = 73◦/s, acceleration = 1600◦/s,resolution = 0.007◦; Tilt: range = 60◦, velocity = 73◦/s,acceleration = 2100◦/s, resolution = 0.007◦) [28]. How-ever, we only used two degrees of freedom of the headsince, for the purpose, of this experiment only a point-ing device was needed. Robots characterized by moresophisticated morphologies could have been used toperform more complex tasks. However, in this prelim-inary stage of research, an exceedingly complex com-bination of morphological, behavioural and computa-tional factors would have been extremely difficult to beinterpreted.

4.1.1. Sensory ModuleThe robotic head was equipped with a videocam-

era capable of acquiring log polar images[29,30]. Logpolar images (Fig. 4) are defined by

x = ρ cos(θ), y = ρ sin(θ),

θ = k · η, ρ = r0 · aξ,

together with

ρ =√x2 + y2, θ = arctan

(yx

),

η = θ

k, ξ = lna

(ρ

r0

).

These images offer two main advantages among theothers: (i) invariance with respect to rotation and scal-ing; (ii) reduced number of pixels with wide field ofview. Furthermore, in this case the use of log-polarimages allows an implicit selection of a target (due tothe space-variant distribution of receptors). In foveatedvisual apparatus, the central part of the image corre-sponds to the majority of pixels and thus when an ob-ject is fixed, its image is much more important than thebackground. As a result, there is no need to performexplicit selection of a target; the direction of the gazeimplicitly selects its own target.

The robotic head has two degrees of freedom: thecamera is capable of a tilt and pan independent mo-tion (Fig. 5). Since the head was able to move onlyin a limited span with the pan and the tilt (40◦ each) itwas possible to determine which point on the board waslooked at. By measuring the angle position of each sac-cade is possible to measure which region of the visualstimulus is more frequently observed by the head.


Fig. 5. Sensory and motor setup.

Fig. 6. The probability density function on the basis of the controlparameterλ.

4.1.2. Motor ModuleThe robotic head is programmed to make random

saccades; a Motor Module generates saccades on thebasis of an input signalλ that controls the probabilitydensity of the amplituder. The motor inputλ is the onlysignal needed by the Motor Module in order to controlits actions. The probability function of the angle hasa uniform distribution from 0 to 2π. The probabilityfunction of the amplitude is equal to

p(r, λ) = 1∫ +rmax−rmax

e−λ·ρ2 dρe−λ·r2,

where r is the random variable for the amplitude(Fig. 6). If λ is low (near to 0), the probability density isalmost constant, therefore there is an equal probabilityfor each amplitude. Ifλ is higher, a small amplitude ismore probable.

The rationale of this probability schema resides onthe fact that the motor unit should mimic an exploratorystrategy. When a visual system explores a field of view,

Fig. 7. The proposed architecture (named ‘Artificial Motivations’)is independent of the sensor and motor parts. It contains some basicinformation about the relevant signal to bootstrap the system.

it makes large random saccades. When it fixates aninteresting object, it makes small random saccades.

4.2. Architecture implementation

It is important to note that no modification has beenmade to the architecture on the basis of the particularproperties of the robotic setup. The architecture couldbe used in a completely different robotic setup, withcompletely different input and output signals withouthaving to change (Fig. 7).

4.2.1. Category ModuleThe Category Module creates clusters of incoming

stimuli on the basis of the Relevant Signal. Each ofthese clusters corresponds to a category. Further thanthe Relevant Signal, the CM uses an internal criteriato control the cluster creation: the distance functiondC(�v, C) between a vector and a cluster. This distanceis derived from a distance function between vectorsd(�v, �w): d : (Rn ×Rn) �→ R, d continuous. must be adistance between vectors. Suitable candidates for thisfunction are the Minkowski function or the Tanimotodistance or the correlation function[31]. In the exper-iment the function is implemented as such

d(�v, �w) = C(�v, �w)

= 1

2

1 −

∑(vi − µv)(wi − µw)√∑

(v − µ )2 · ∑(w − µ )2

.

i v i w


The advantages of this function are that it is morerobust to change in average value, more resistant tonoise.

On the basis ofd(�v, �w) it is possible to define thedistance function between a vector and a set of vectors.

Two solutions are easily implemented. First, the dis-tance between a vector and a cluster is computed as theminimum distance between a given vector�v and all thevectors belonging to a given setC:

dC(�v, C) = minw̄∈C

(d(�v, �w)).

Yet the above approach is computationally expen-sive since it entails that, for a given set, all vectors mustbe stored somewhere. A different approach is based onthe assumption that it is possible to compute the av-erage distance, which is equal to the distance with thecentre of gravity. IfM is the number of elements of setC, and�c is its mean vector:

dC(�v, C) =∑

w̄∈Cd(�v, �w)

M= d(�v, �c).

This approach has the advantage that is sufficient tokeep in memory only the mean vector of each set. Thismeans that each set can be stored as a vector. The resultsare based on this solution. It is important to note thatno specific information about the nature of the vectorsis part of the Category Module.

4ri-

t ilt-inc cts.T ticf

R

whereRph is the Relevant Signal, Saturation(η, ξ) thecolour saturation at the pixel (η, ξ) in log polar coor-dinates, andN the total number of pixels in the im-age. ThereforeRph is proportional to the average levelof colour saturation. This phylogenetic function repre-sents the only built-in part of the architecture. It cor-responds to the phylogenetic contribution to the devel-opment of the system. The Relevant SignalRph is usedto control the motor behaviour: even if the architec-ture were composed only by the phylogenetic module,it would drive the system towards highly colour satu-rated targets. In the neighbourhood of a coloured objectoscillations of this function are possible, however therewill always be a maximum in correspondence of an im-age centred on the coloured target. When the target isin the fovea of the log polar image, it corresponds tothe maximum number of pixels.

The Ontogenetic Module corresponds to the defi-nition we gave in Section3.3; no modifications wereneeded.

4.3. A comparison with Pavlov’s classicconditioning

As a final argument, we would draw a compari-son with Pavlov’s classic experiment of conditioning(Figs. 8 and 9). The reasons for this comparisons aretwo-fold: (i) there are strong similarities; (ii) there is ev-idence that many cognitive learning processes could bereduced to Pavlov’s associationism[5,32]. In Pavlov’sc ther nse.A ents ont tim-u thant thec

ditionin

.2.2. Phylogenetic and Ontogenetic ModulesThe Phylogenetic Module contains the built-in c

eria to bootstrap the system. In this case the buriterion consists in selecting brightly coloured objehis module implements the following phylogene

unction:

ph =∑

Saturation(η, ξ)

N,

Fig. 8. The three stages of con

ase, the focus was on the capability of modifyingelation between a given stimulus and a given respolthough, Pavlov’s dog was able to select a differtimulus (the ring of the bell), the focus was morehe fact that the dog was capable of linking the slus to a behaviour (the salivary response) rather

o the capability of selecting a given stimulus fromontinuum of the environment.

g in the classical Pavlov experiment.


Fig. 9. The three stages of ontogenetic development.

In Pavlov’s experiment, there are two hardwired re-ceptors for two different kinds of stimuli (sound of abell and meat powder): one is a neural structure capa-ble of recognizing the presence of food and another isa neural structure capable of recognizing the ring of abell. Before the conditioning process, the behaviouralresponse (the salivation) was only connected with thepresence of food. During the training, the conditionedresponse became stronger, more drops of saliva weresecreted. The learning consisted in the creation of aconnection between the conditioned stimulus and theresponse.

In our case, the conditioned stimulus does not ex-ist before the conditioning process. The machine is notcapable of recognizing the unconditioned stimulus (theshape of an object). It only recognizes coloured objects.At first sight, our experiment might recall Pavlov’s ex-periment. It could be argued that the Phylogenetic Stim-ulus corresponds to the Unconditioned Stimulus, andthe Ontogenetic Stimulus corresponds to the Condi-tioned Stimulus; and the Developmental Signal mightcorrespond to the Response (first Unconditioned andthen Conditioned). This is not the case. In the de-scribed circumstances, since the colour was presentedconjointly with the shape of an object, a new ontoge-netic stimulus (the shape) is added to the machine’srepertoire of stimuli.

A useful concept is that of theUmweltof a subject[33,34]: the set of all events which can interact with asubject given its sensory/motor/cognitive capabilities.I mo-

tivations, the Umwelt of the machine is increased andenlarged to a new kind of event. Two things have hap-pened: (i) the machine has learned to recognize some-thing which was previously unknown to it; (ii) the ma-chine has linked such new stimulus to a given motor be-haviour. Pavlov’s experiment highlighted the fact thatthe dog had learnt a relation between an already as-sessed stimulus to a motor response. The goal of ourexperiment is to create the capability of recognizingnew stimuli.

4.4. Experimental results

We presented different sets of visual stimuli to thesystem. A first set consisted in a series of colourlessgeometrical figures as shown inFig. 10a on the left.The frequency with which the system was looking atdifferent points was measured. The system spent moretime on stimuli corresponding to its motivations by re-ducing the amplitude of its saccades. At the beginningthe system was looking around completely randomlywith large saccades since its Ontogenetic Module wasunable to catch anything relevant and the Phyloge-netic Module was programmed to look for very sat-urated coloured objects, which were absent in the firstset.

Subsequently we presented a different stimulus: aseries of coloured figures (Fig. 10b on the left). Thedifference is shown inFig. 10b. The head spent moretime on the coloured shapes instead on the white back-g rule.
n the case of the ontogenetic development of new round because of the phylogenetically implanted


Fig. 10. Experimental results.

Finally we presented again the initial stimulus (theset of colourless shapes). The system spent moretime on the colourless shapes than on the background(Fig. 10c). The behaviour of the system changed sincethe system added a new motivation (shapes) to the pre-vious ones (saturated colours).

In order to measure the different behaviour of thesystem, the time spent by the system on each shapewas measured in two different ways: a qualitative one(the middle column) and a quantitative one (the rightcolumn).

To get a qualitative visual description of how muchtime was spent by the system on each point of its fieldof view, we assigned to each point of the visual fieldan intensity value proportional to the normalized timethe system gaze spent on it. The images in the centre of

Fig. 10a–c were generated after 103 saccades (equiva-lent to about 500 s). The field of view of the head wasdivided in a 64× 64 array. For each point (i, j) in thevisual field, the amount of time the gaze of the headwas directed on it was computed:

ti,j = total time spent looking at point (i, j).

The intensity of the point was then set proportionalto a normalized value ofti,j . With the first set of visualstimuli, the resulting image is inFig. 10a (middle).The system does not show any polarization towardsa specific part of the field of view. The behaviour ofthe system is completely different to its response tothe coloured stimulus: there are three definite centresof interest (Fig. 10b, middle). However, after the


interaction with the star has shaped a new goal whichbecomes part of its behaviour. InFig. 10c (middle)the original grey stimuli produces a completelydifferent response: the grey star became a centre ofinterest.

To get the quantitative measure (right column), wemeasured the time spent by the head inside the circularareas shown in the left column surrounding the stimuli:a rough indicator of the time spent looking at a certainshape. The region of interest were named according tothe following notation: the coloured figures (R1), greystar (R2), grey cross (R3), grey waves (R4), grey circle(R5). In the graphs ofFig. 10 on the right, for eachregion, the normalized time was computed accordingto the following formula:

ck = 100

∑(i,j) ∈Ak ti,j∑

ti,j,

whereti,j is the same of the previous formula, andAkcorresponds to a set of region:A1 corresponds—foreach group of stimuli—to what is not occupied by thestimuli;A2 corresponds to the union of the three areasoccupied by the three coloured stars (R1); whileA2,3,4,5correspond, respectively, to the four regions occupiedby the grey shapes (R2,3,4,5).

In order to test the efficacy of the architecture pre-sented, the experiment was repeated in a simulatedenvironment. In this way it was possible to check itss tion.I ilars s di-r im-a ah -t ctlym encyd ol-l teds ntc ertyo

hi-t orer theg eses

Fig. 11. A simulated version of the experiment: on the left the arti-ficial stimuli, on the right the measured fixation points.

5. Conclusions

Ever since Grey Walters’ wrote about his turtles thehistory of robots has chronicled their efforts to estab-lish a relationship with the environment. The transi-tion from deliberative robots to reactive robots, thento behaviour-based robots bears witness to this trend.The recent appearance on the market of entertainmentrobots sheds new light on motivation-based robots.

Environment driven motivations provide the inter-nal criteria for the development of artificial beings andsupply their means and goals: how they do what theydo.

Ontogenetic development allows the artificial be-ing to elaborate the criteria on which it can associateexternal stimuli. In the aforementioned experiment thevisual stimuli were associated first on the basis of a phy-logenetic criterion (the colour saturation), then on thebasis of an ontogenetic criterion derived from the sys-tem experience (shape). The ontogenetic architectureallows to self-associate different stimuli (the colour andthe shape) on the basis of the interaction with the envi-ronment. A new motivation (looking for a shape) is theproduct of the individual history of the architecture in agiven environment. Recently, self associative learninghas been identified as the possible key to the develop-ment of consciousness[5]. It follows that an ontoge-

oundness and generalize its software implementan the simulated version of the experiment, simtimuli were presented and a simulated gaze waected towards different points of the image. Theges used were 1024× 768 pixels; the artificial retinad a 64 pixels diameter. InFig. 11, the experimen

al results are visible. All the other parameters exaatch the Eurohead. Instead of computing a frequensity value to each point of the field of view, a c

ection of 103 is displayed for each of the presentimuli. From a qualitative point of view, the relevahanges in the behavioural and motivational propf the system are clearly visible.

In future, we are planning to implement this arcecture in more complex robotics setup and in mealistic environment. However, we believe thateneral principle is already clearly illustrated by thimplified experiments.


netic architecture based on environment-derived moti-vations might provide the basis for the development ofan artificial conscious robot.

In this paper we have used the intentionalistic men-talistic vocabulary to introduce intentional conceptssuch as ‘motivations’ and ‘experience’. A correct defi-nition of these terms applied to artificial beings shouldbe free from any ontological or linguistic commitment.This tenet is evident when instead of biological be-ings we have to deal with artificial systems since it isnot clear whether they possess intentional properties ornot. For instance, if we are dealing with human beings,it is safe to use words like ‘intentions’, ‘motivations’,‘experience’. However, if we are dealing with robots orother kinds of artificial systems, it is ambiguous to usethe same terminology. Edelman and Tononi wrote that:“to understand the mental we may have to invent furtherways of looking at brains. We may even have to syn-thesize artifacts resembling brains connected to bodilyfunctions in order fully to understand those processes.Although the day when we shall be able to create suchconscious artifacts is far off we may have to make thembefore we deeply understand the processes of thoughtitself” [35].

References

[1] R.C. Arkin, Behavior-based Robotics, MIT Press, Cambridge,MA, 1999.

nd

au-

pro-nition

, Be-

pos-

bot,and

ofkuba,

ica-Sys-

[ ress,

[11] J.L. Elman, et al., Rethinking Innateness: A Connectionist Per-spective on Development, MIT Press, Cambridge, MA, 2001.

[12] S. Oyama, The Ontogeny of Information, Duke UniversityPress, Durham, 1985/2000.

[13] E.C. Tolman, Prediction of the vicarious trial and error by meansof the schematic sowbug, Psychological Review 23 (1939)318–336.

[14] V. Braitenberg, Vehicles: Experiments in Synthetic Psychology,MIT Press, Cambridge, MA, 1984.

[15] R.A. Brooks, Intelligence without representations, Artificial In-telligence Journal 47 (1991) 139–159.

[16] R.A. Brooks, New approaches to robotics, Science 253 (1991)1227–1232.

[17] R.S. Sutton, A.G. Barto, Reinforcement Learning, MIT Press,Cambridge, MA, 1998.

[18] M. Kuperestein, INFANT neural controller for adaptivesensory-motor coordination, Neural Networks 4 (1991)131–145.

[19] G. Metta, G. Sandini, J. Konczak, A developmental approach tovisually guided reaching in artificial systems, Neural Networks12 (1999) 1413–1427.

[20] R. Manzotti, et al., Disparity estimation in log polar images andvergence control, Computer Vision and Image Understanding83 (2001) 97–117.

[21] C. Ferrell, C. Kemp, An ontogenetic perspective to scaling sen-sorimotor intelligence, in: Proceedings of the 1996 AAAI FallSymposium on Embodied Cognition and Action, 1996.

[22] D. Gachet, et al., A software architecture for behavioural con-trol strategies of autonomous systems, in: Proceedings of theConference, IECON’92, San Diego, CA, 1992.

[23] J.J. Weng, Cresceptron, SHOSLIF: toward comprehensive vi-sual learning, in: S.K. Nayar, T. Poggio (Eds.), Early VisualLearning, Oxford University Press, New York, 1996.

[24] K. Fukushima, S. Miyake, T. Ito, Neocognitron: a neural net-work model for a mechanism of visual pattern recognition,

1983)

[ ual

[ ork(3)

[ tion,99) 1–

[ truc-

[ ruc-age

[ hu-A-6,

[ 2nd

[ cific,

[2] D. McFarland, T. Bosser, Intelligent Behavior in Animals aRobots, MIT Press, Cambridge, MA, 1993.

[3] A. Lansner, Motivation based reinforcement learning intonomous robot, SANS (1994).

[4] P.M. Merikle, T.A.W. Visser, Conscious and unconsciouscesses: the effects of motivation, Consciousness and Cog8 (1999) 94–113.

[5] P. Perruchet, A. Vinter, The self-organizing consciousnesshavioral and Brain Sciences (2002).

[6] J. Zlatev, The epigenesis of meaning in human beings, andsibly in robots, Minds and Machines 11 (2001) 155–195.

[7] R.A. Brooks, et al., The Cog project: building a humanoid roin: Nehaniv (Ed.), Computation for Metaphors, Analogy,Agents, Springer-Verlag, Berlin, 1999.

[8] R.A. Brooks, A.M. Flynn, Robot beings, in: Proceedingsthe IEEE Intelligent Robots and Systems Conference, Tsu1989.

[9] E.M.A. Ronald, R. Poli, Surprise versus unsurprise: impltions of emergence in robotics, Robotics and Autonomoustems 37 (1) (2001) 19–24.

10] S.J. Gould, Ontogeny and Phylogeny, Harvard University PCambridge, MA, 1977.

IEEE Transactions on Systems, Man, and Cybernetics 13 (826–834.

25] K. Fukushima, M. Okada, K. Hiroshige, Neocognitron with dC-cell layers, Neural Networks 7 (1) (1994) 41–47.

26] T. Togawa, K. Otsuka, A model for cortical neural netwstructure, Biocybernetics and Biomedical Engineering 20(2000) 5–20.

27] S. Grossberg, The link between brain learning, attenand consciousness, Consciousness and Cognition 8 (1944.

28] M. Vincze, et al., A system to navigate a robot into a ship sture, Machine Vision and Applications (2003) 14.

29] G. Sandini, V. Tagliasco, An anthropomorphic retina-like stture for scene analysis, Computer Vision Graphics and ImProcessing 14 (1980) 365–372.

30] G. Metta, et al., Development: is it the right way towardsmanoid robotics? in: Proceedings of the Conference, ISIOS Press, Venezia, 2000.

31] R.O. Duda, P.E. Hart, D.G. Stork, Pattern Classification,ed., Wiley, New York, 2001, xx, 654.

32] I.P. Pavlov, Selected Works, University Press of the PaHonolulu, Hawaii, 1955/2001.


[33] J.V. Uexk̈ull, A Stroll through the Worlds on Animals and Men:A Picture Book of Invisible Worlds, International UniversityPress, New York, 1934.

[34] T. Ziemke, The construction of ’reality’ in the robot: construc-tivist perspectives on situated artificial intelligence and adaptiverobotics, Foundations of Science 6 (1–3) (2001) 163–233.

[35] G.M. Edelman, G. Tononi, A Universe of Consciousness. HowMatter Becomes Imagination, Allen Lane, London, 2000.

Riccardo Manzotti was born in Parma,Italy, in 1969. He received a Laurea De-gree in Electronic Engineering in 1994, inPhilosophy in 2004, a Ph.D. in Roboticsin 2000 from University of Genoa. He iscurrently Assistant Professor in Psychologyat the Institute of Human and Environmen-tal Sciences (IULM University, Milan). Hispresent research activities focus on the areasof artificial intelligence, consciousness androbotics.

Vincenzo Tagliascowas born in Savona,Italy, in 1941. He is currently a Full Pro-fessor of Bioengineering at the Universityof Genoa, Italy. His main field of work in-cludes vision and motor system in humanbeings. At present he teaches and carries outresearch work in the area of artificial con-sciousness.

Documents

From behaviour-based robots to motivation-based robots LEARINIG/from behavio… · The moti-vation is ‘to have the mother’s image inside the visual ... tivation Maker module sets