A biologically inspired architecture for an autonomous and social robot

Copyright (c) 2011 IEEE. Personal use is permitted. For any other purposes, Permission must be obtained from the IEEE by emailing [email protected].

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

JOURNAL OF LATEX CLASS FILES, VOL. 6, NO. 1, JANUARY 2007 1

A biologically inspired architecture for anautonomous and social robot

María Malfaz, Álvaro Castro-González, Ramón Barber, Miguel A. Salichs

Abstract—Lately, lots of effort has been put into the construc-tion of robots able to live among humans. This fact has favoredthe development of personal or social robots, which are expectedto behave in a natural way. This implies that these robots couldmeet certain requirements, for example: to be able to decidetheirown actions (autonomy), to be able to make deliberative plans(reasoning), or to be able to have an emotional behavior in orderto facilitate human-robot interaction.

In this paper, the authors present a bio-inspired controlarchitecture for an autonomous and social robot, which tries toaccomplish some of these features. In order to develop this newarchitecture, authors have used as a base a prior hybrid controlarchitecture (AD) that is also biologically inspired. Nevertheless,in the later, the task to be accomplished at each moment isdetermined by a fix sequence processed by the Main Sequencer.Therefore, the Main Sequencer of the architecture coordinates thepreviously programmed sequence of skills that must be executed.In the new architecture, the Main Sequencer is substituted by adecision making system based on drives, motivations, emotions,and self-learning, which decides the proper action at everymoment according to robot’s state. Consequently, the robotimproves its autonomy since the added decision making systemwill determine the goal and consequently the skills to be executed.A basic version of this new architecture has been implementedon a real robotic platform. Some experiments are shown at theend of the paper.

Index Terms—Cognitive robotics, control architectures, auton-omy, decision making systems, motivations, emotions.

I. I NTRODUCTION

I N the Nineties, the term “cognitive robotics” was firstintroduced by Ray Reiter and his colleagues, who have

a research group on this topic at the University of Toronto.According to them, cognitive robotics is concerned withendowing robotic or software agents with higher level cogni-tive functions that involve reasoning about goals, perception,actions, mental states of other agents, collaborative taskexe-cution, etc.

Since the Seventies, robotics has evolved trying to provideuseful services to humans. Today, robots which carry out dan-gerous [1], assistance [2], or transportation tasks [3], amongothers, are a reality. Traditionally, robotic research hasbeencentred on control architectures, planning, navigation, etc.Nevertheless, during the last few years, the interest in robotswhich are integrated in our everyday environment, personalrobots, has increased [4]. Human-robot interaction is one ofthe main characteristics of these robots.

The authors are with the RoboticsLab at the Carlos III University ofMadrid. 28911, Leganés, Madrid, Spain e-mail: [email protected]; [email protected]; [email protected]; [email protected]

Some researchers in cognitive robotics have begun to usetheir old architectures as a base for their cognitive roboticsprograms. The idea is to extend these architectures in orderto implement some of the high level cognitive functions.Some examples of high level cognitive functions alreadyimplemented on robots are shown at [5][6]. The first controlarchitecture developed by the authors, a hybrid architecturenamed AD (Automatic/Deliberative), took the theories of themodern psychology expressed by Shiffrin and Schneider [7],[8] as a base. According to these authors, two mechanisms ofprocessing information are established: automatic processesand controlled ones. Therefore, we can differentiate betweentwo levels of activity in human beings: automatic and delib-erative.

Moreover, at the beginning of the Sixties, the artificial intel-ligence precursor Herbert Simon was convinced that includingemotions in the cognitive model to approximate the humanmind was necessary [9]. Later, near the mid Nineties, AntonioDamasio publishedDescartes’s Error[10]. His studies provedthat damage to the brain’s emotional system caused the patientto make poor judgments despite intact logical reasoning skills.As a consequence, the positive role of human emotions in cog-nition started to gain prominence among a group of researchersfrom the scientific community. Later, other studies showed thatemotions have influence on many cognitive mechanisms, suchas memory, attention, perception, and reasoning [11], [12],[13], [14]. Besides, emotions play a very important role insurvival, social interaction and learning of new behaviors[15],[16], [17].

Therefore, in recent years, the role of emotional mechanismsin natural and artificial cognitive architectures, in particular incognitive robotics, has been considered. According to [18], inrelation to the main question: do robots need emotions? manyresearchers have answered positively, mainly consideringthetwo aspects of emotion: the external (social) one and theinternal (individual) one. It seems to be obvious that in human-robot social interaction, expression of emotions helps to makeinteraction more natural [19]. On the other hand, the internalaspects of emotion, i.e., its role in the behavioral organizationof an individual cognitive agent, are essential for the autonomyissue, and this is the main concern of many researchers.

The concept of autonomy has been treated by severalauthors such as Arkin [20], Gadanho [12], Bellman [16],or Cañamero[17]. In general, they state that an autonomousagent must be self-sustained, which implies a decision makingsystem. Moreover, it must have some goals and motivationsand these must be oriented to maintain its internal equilibrium(homeostasis).




In fact, in recent years, several authors have argued that atruly biologically inspired and truly cognitive robotics wouldneed to take into account homeostatic/emotional dynamics,i.e., the interplay between constitutive and interactive aspectsof autonomy; for example, the need to keep essential system-internal variables within certain viability ranges [21].

The work presented in this paper tries to consider all the pre-vious requisites in order to design a new biologically inspiredarchitecture for an autonomous and social robot. Although thecurrent context of this robot is a laboratory, this proposedarchitecture will be implemented on social robots living withhuman beings and sharing common spaces with complexconfigurations. In these situations, autonomy and friendlyhuman-robot interaction are essential. Therefore, as previouslystated, in order to implement those features in our robot, thisnew bio-inspired architecture is required. In order to fulfill thisgoal, we started our approach using the previous hybrid controlarchitecture developed by the authors, the AD architecture[22]. This architecture has been modified, replacing the MainSequencer that manages the global behavior of the robotwith a decision making system based on drives, motivations,emotions, and self-learning [23]. Following a homeostaticapproach, the goal of the robot can be to satisfy its needsmaintaining its necessities within an acceptable range. Thelearning process is made using a well-known reinforcementlearning algorithm, Q-learning [24]. By using this algorithm,the robot learns the value of every state-action pair throughits interaction with the environment. This means, it learnsthevalue that every action has in every possible state. The highestvalue indicates that the correspondent action is the best oneto be selected in that state. At the beginning of the learningprocess these values, called the q-values, can all be set tozero, or some of them can be fixed to another value. In thefirst case, this implies that the robot will learn from scratch,and in the second, that the robot has some kind of previousinformation about the behavior selection. These initial valueswill be updated during the learning process.

Before implementing this system on a real robot, as aprevious step, this work was successfully implemented onvirtual agents [25], [26], [27]. In those works, how the agentlearned the right action to execute in every situation presentedin the environment by using reinforcement learning algorithmswith no previous knowledge is shown.

In this paper, we present the first results of our work byshowing the implementation on a real robotic platform of asimplified version of this new architecture, with neither emo-tions nor self-learning. The robot is working with a decisionmaking system based on drives and motivations, which sendsthe information about what to do in every moment to the ADarchitecture.

The paper is organized as follows. The next section, sectionII, introduces some basic concepts needed for the decisionmaking system. Next, in section III, the questionwhy do robotsneed emotions?is answered from a functional point of view.In section IV, a brief review of biologically inspired controlarchitectures, some of them based on traditional mechanismsof processing information, and others based on emotions andmotivations, is given. Section V shows the approach proposed

in this paper: the AD architecture, modified by adding adecision making system. In this section, both the architectureand the system, are described. Next, section VI shows how abasic version of this new architecture is being implementedon a real robotic platform. Moreover, this platform is alsobriefly described. Section VII presents some results of theexperiments carried out. And finally, the conclusions andfuture works are summarized in section VIII.

mdsFebruary 25, 2010

II. H OMEOSTASIS, DRIVES, AND MOTIVATIONS

In this section we give a brief review of the basic conceptsrelated to the decision making system. As stated previously,we will follow an homeostatic approach when designing thissystem and terms such as drives and motivations must beintroduced.

Homeostasis was discovered by Claude Bernard in the mid-dle XIX century when he observed that the body variationshad as an objective to give the stability back to the body. Inother words, we could say that homeostasis means maintaininga stable internal state, see [28]. According to the homeostaticapproach, the human behavior is oriented to the maintenanceof the internal equilibrium.

One of the oldest theories about drives was proposed byHull in [29]. Hull suggested that privation induces an aversionstate in the organism, which is termed drive. According to histheory, the drives increase the general excitation level ofananimal and they are considered as properties of deficit stateswhich motivate behavior.

The word motivation derives from the Latin wordmotusandindicates the dynamic root of the behavior, that means thoseinternal, more than external, factors that urge to action [30].In other words, the motivational state is a tendency to correctthe error (drive) through the execution of behaviors.

Many drive theories of motivation between 1930 and 1970posited that drive reduction is the chief mechanism of reward.If motivation is due to drive, then, the reduction of deficitsignals should satisfy this drive and essentially could be thegoal of the entire motivation [28].

Hull [29] also proposed the idea that motivation is de-termined by two factors. The first factor is the drive. Thesecond one is the incentive, that is, the presence of an externalstimulus that predicts the future reduction of the need. Forexample, the presence of food constitutes an incentive for ahungry animal.

In our work, the robot has certain needs (drives), that needto be satisfied, and motivations. Following the homeostaticapproach, the proposed decision making system will be ori-ented to maintain those needs within an acceptable range.These needs will not be just limited to physical ones (as itis stated in the classical point of view of the homeostasis),butpsychological and social necessities too. In the next section,the reasons why emotions should be included in this decisionmaking system will be analyzed.




III. W HY DO ROBOTS NEED EMOTIONS?

Several authors have expounded their reasons to includeemotions in robots besides their importance in the human-robot interaction. Moreover, others have studied their genera-tion as well as the optimal number that should be implementedon virtual agents or real robots. In this section, a brief reviewof these ideas is given.

According to Arkin, motivations/emotions provide two po-tential crucial roles for robotics: survival and interaction [15].Cañamero considers that emotions, or at least a sub-group ofthem, are one of the mechanisms founded in biological agentsto confront their environment. This creates ease of autonomyand adaptation. For this reason she considers, similarly toArkin, that it could be useful to exploit this role of emotionsto design mechanisms for an autonomous agent [17].

On the other hand, Ortony explains that robots need emo-tions for the same reason as humans do: one of the fun-damental functions of emotions is that they are a requisitefor establishing long-term memories. The second function isthat emotions provide opportunities for learning, from simpleforms of reinforcement learning to conscious and complexplanning [31].

In the same line, Bellman [16], Fellows [32], and Kelley[33] state that, since emotions allow animals with emotionsto survive better than others that lack emotions, robots shouldbe provided with features related to emotions in a functionalway.

Different models of emotion systems have been proposedto be implemented in artificial agents. One of the maindifferences among them is the mechanism used to generateemotions.

Currently, most experts agree that emotions are produced byan appraisal of the situation of the agent in its relation with theworld. Therefore, different emotions are associated to differentsituations. Many researchers think that the relation betweensituations and emotions is mediated by a set of intermediatevariables. These variables act as dimensions of an affectivespace and each emotion is associated to a different zoneof the affective space [19] [34]. Lazarus [35], on the otherhand, considers emotions as discrete categories. In the discreteemotional approach, dimensions of emotional intensity arestill employed, but these are applied within each emotionalcategory.

In our work, we follow a discrete emotional approach andwe consider that the relation between situations and emotionsis specific for each emotion. Therefore, each emotion requiresa particular study to establish this relationship.

According to Spinola and Queiroz [36], another importantissue related to the implementation of artificial emotions inrobots is: How many and which emotions must be selected?In this work many different approaches are described, fromauthors that defended the idea of implementing a varyingnumber of “basic” or “primary” emotions, from 4 to 22, [37][38] [39], to others that decided to implement just one or twoemotions [40], [41]. Finally, one very different point of viewis presented by Cañamero in [42]: “Do not put more emotionin your system than what is required by the complexity of thesystem-environment interaction”.

Following this last point of view, currently, our researchfocuses on three emotions: happiness, sadness, and fear. Untilnow, the implementation of emotions has been done on virtualagents which live in a simple environment. In a near future, ourintention is to define new emotions when the complete controlarchitecture is implemented on a real robot. Therefore, as thefunctionality of our robot and its environment become morecomplex, it will have to cope with new situations and maybenew emotions, or a redefinition of the existing ones, will beneeded.

IV. RELATED WORK

A. Classical control architectures

Classical architectures are mainly focused in navigationtasks. Early robots, in the Sixties and Seventies, used planning-based architectures [43][44]. Any movement of these robotshad to be planned in advance. Planners needed models topredict the results of each action. The main goal of theserobots was motion, and therefore, the models were maps ofthe environment and the planners were motion planners.

Poor results of planning-based architectures obliged tosearch for other alternatives. In the mid Eighties, reactivearchitectures began to be developed [45], [46], [47]. In reactivearchitectures, the use of planners and models was minimized.In fact, there were neither maps nor planners in most robotswith reactive architectures. Decisions were based on realtime information from sensors, making the creation of mapswith that information unnecessary. At this time, this approachproduced very good results in comparison with planning basedarchitectures. For instance, robots were able to move quitefastin dynamic environments, avoiding obstacles.

Reactive architectures meant a significant advance in thedevelopment of robots, although not everything was positivein reactive architectures. They also have some drawbacks.Behaviors of robots with reactive architectures usually donotinclude the achievement of an explicit goal as in the planning-based architectures.

In sum, both approaches, reactive and planning-based, offersome advantages, but they also show some drawbacks. Tryingto get the best of both, in the mid Nineties hybrid architecturesbegan to appear. These architectures usually adopt a reactiveapproach at the low level (the modules closer to sensorsand actuators), and a planning-based approach at high level.That means that motion control loops are closed at lowlevel producing different behaviors, and at the same timeit is possible to reach planned decisions based on models.Reactive modules make short term decisions in local areas(e.g. immediate movements in the area close to the robot) andplanning modules make mid and long term decisions at globalareas (e.g. future movements to distant areas).

Among hybrid architectures, the next ones can be high-lighted. Firby establishes three levels: a planner which makesplans according to the goal to be reached, a controller whichinteracts with the environment, and an executor (RAP) thatlinks the planner and the controller, giving the detailed infor-mation which the controller requests from the planner’s infor-mation [48] [49]. Bonasso, in the 3T architecture, considers




a layer consisting of reactive skills, a sequencer that enablesor disables the skills, and a deliberative planner capable ofguiding the robot to the target goal [50] [51]. Gat, in theATLANTIS architecture, distinguishes among a controller ofreactive primitive activities, a sequencer that manages thoseprimitive activities according to the deliberative computations,and a deliberator which is in charge of the planning [52][53] [54]. Lastly, Chatila considers a functional level whichincludes perceptive and motor capacities, an execution level,without reaction capacity, which controls them, and a deci-sional level in which the planning and supervision are included[55].

The three layer architectures mentioned above have thesequencing layer between the deliberative and reactive ones.This fact leads to a rigidity in the planning-sequencing-actingparadigm.

On the other hand, our AD (Automatic/Deliberative) ar-chitecture [56] was designed trying to avoid rigidity in thementioned planning-sequencing-acting paradigm. It is com-posed by only two levels: one for deliberative activities and asecond one for automatic activities. The sequencing processesare distributed between the Deliberative and Automatic levels,providing more flexibility to the hybrid architecture.

B. Control architectures based on motivations and emotions

More recently, as previously stated, some authors haveimplemented cognitive-related concepts in their control archi-tectures, such as motivations, emotions, learning, etc. Inthissection, we present a review of the works that have inspiredour research.

The work developed by Lola Cañamero is one of the firstresearches done in this area [57], [58], [17]. The original ideawas that the behaviors of an autonomous agent are directedby motivational states and its basic emotions. As it is saidbefore, motivations can be viewed as homeostatic processesthat maintain inner variables controlled within a certain range.A detector of errors generates an error signal, the drive, whenthe value of this variable is not equal to its ideal value. Eachmotivation is modelled as a function of its related drive andan external or incentive stimulus. The motivation with thehighest value becomes active and it will organize the behaviorof the agent in order to satisfy the drive. Emotions in thisapproach influence the decision making process in two ways.First, they can modify the intensity of the current motivationand, as a consequence, the intensity of the related behavior.In fact, in extreme cases, they can avoid the execution of thebehavior. Second, they can modify the reading of the sensorsthat monitors the variables affected by emotions. Therefore,they can alter the perception of the state of the body. Theimplemented emotions work as monitoring mechanisms tocope with important situations related to survival.

Another interesting approach is the one presented byGadanho [12], [59]. In this work, the research is focusedon how artificial emotions can improve the behavior of anautonomous robot. In her approach, the robot adapts to itsenvironment using an adaptive controller adjusted by usingreinforcement learning. Emotions are used to influence per-ception, as Cañamero does, and to provide a reinforcement

function. This is because, according to the authors, it isfrequently assumed that the human decision making processconsists on maximizing the positive emotions and minimizingthe negative ones. In later works, [60], the emotional systemwas substituted by a goal system. This system is based on aset of homeostatic variables which must be maintained withina certain range. The goals are explicitly associated to thehomeostatic variables.

Another approach was developed by Velásquez [61], [62],who proposed an architecture called Cathexis. This archi-tecture was developed for autonomous agents and containsan emotion generation model. Moreover, it also has simplemodels for other motivations and a decision making algorithm.Later, this architecture was completed by a drive system inorder to develop a decision making model based on emotions.In this model, the emotional system is the main motivation ofthe agent. The drive system even exploits its influence in orderto select specific behaviors. For example, the Hunger drive andthe Distress caused by it motivate the agent to obtain food. Inthis model, the behaviors compete among each other to takethe control. Therefore, only one behavior is active at a time.

Currently, this work has been continued by CynthiaBreazeal, whose main research interest is the study of human-robot interaction. The developed robots, Kismet and Leonardo,have a cognitive and an emotional system. The cognitivesystem is formed by the perception, the attention, the drives,and the behavior systems. The behaviors are selected basedon the values of the drives and the external stimuli. Thesebehaviors are also related to every drive and they competeto determine which need must be satisfied. The role of theemotional system is to influence the cognitive system topromote appropriate and flexible decision making, and tocommunicate the robot internal states, see [63], [64], and [65].

Nowadays, many efforts have also been put on autonomousagents for characters in computer games. Sevin, in [66],developed a motivational model of action selection for virtualhumans. The model chooses the appropriate behavior accord-ing to the motivations and the environmental information.In other works, there are actions associated to motivations.Therefore, the actions related to the highest motivation becomeactive. Most of the time, the action receiving activity fromthehighest internal variable is the most active one, and then itischosen by the action selection mechanism.

Until this point, most of the presented works use a moti-vational system in order to select the behaviors and, in someof them, emotions are only used to influence this decisionmaking in one way or another. Nevertheless, there are otherapproaches that use emotions as the central aspect of thedecision making system. This is the case, for example, of thework presented by Hirth et al, [67]. They propose an emotion-based control architecture which consists of three main parts:behavior, emotion, and cognition. All possible movementsof the robot, from simple reflexes up to high level motorskills, are located in the behavior group. These behaviors areactivated in different ways, e.g., directly depending on sensordata, depending on the emotional state, or deliberately by thecognition part. In this architecture, the high level behaviors aremostly activated by the emotions and specially by the cognitive




part, whereas low level behaviors are activated directly bythesensor input.

In [34], Hollinger et al present another robot using emotion-based decision mechanisms. These mechanisms are based onthe Mehrabian PAD (pleasure, arousal, and dominance) scalethat determines emotions using an affective space. The robotstate is translated into a particular set of sound and movementresponses. In this approach, the emotion state of the robotvaries according to its interaction with people. In fact, thisgets modified when the robot sees different color shirts.

Finally, another approach is the one presented by Lissetiand Marpaung in [68], where the behavior of the robot isselected according to its current emotional state. They generatethis emotional state based on the data received from the inputsensors of the robot. In fact, each emotion is related to certainexternal events, e.g., the parameter of the Sad emotion isincreased if the door is closed or the robot does not recognizesomeone. Once the emotional state is determined, the robotwill execute the proper action tendency, i.e., the robot identifiesthe most appropriate (or a set of) actions to be taken from thatemotional state.

The presented work has been inspired mainly byCañamero’s, Gadanho’s, and Velásquez’s works. As will beshown in the next section, we use homeostatic drives that arerelated to motivations, as those authors do. In our approach,the motivations, and not the behaviors (as referred to inVelásquez’s and Breazeal’s approaches) compete among eachother following the point of view of Cañamero. Nevertheless,in her approach, the winner motivation has a related behaviorthat satisfies the associated need, as Sevin also proposes.

In fact, one of the main differences of our work with otherauthors’s approaches is that the behaviors are not necessarilypreviously linked with a need or an emotion. This meansthat there are no motivational or emotional behaviors. Theagent/robot can learn, using a reinforcement learning algo-rithm, which behavior to select in order to satisfy the mosturgent drive. In Cañamero’s and Sevin’s works, it is assumedthat there is only one behavior able to satisfy one need. Thisfact can be seen as a disadvantage, since it limits the flexibilityof the decision making system. It could happen, as in ourapproach, that several behaviors satisfy the same need. Thispoint of view seems to be more bio-inspired since, in nature,in order to satisfy for example, hunger, we can eat somethingbut also, drinking some water can reduce this need.

The second difference is that in our approach, the way eachemotion is defined in the architecture is different. This meansthat emotions are not defined as a whole as most authors do. Ascan be observed, there are two points of view in relation to therole of emotions in the decision making process. Cañamero,Gadanho, Velásquez, and Breazeal used emotions to influencethe decision making process, not for selecting the behaviordirectly according to them. On the contrary, others, such asHirth et al, Hollinger et al, and Lisseti and Marpaung consideremotions as the central aspect of their decision making systemso, in some cases, the behavior is selected according to thecurrent emotional state. In our approach, we do not limitthe role of emotions to one of them, but we exploit bothpoints of view. On one hand, some emotions are used as the

reinforcement function in the learning process, as Gadanhoalso proposed, not determining directly the action selection.On the other hand, other emotions are defined as motivationsso, the behaviors will be completely oriented to cope with thesituation that generated those emotion.

V. OUR APPROACH: AD ARCHITECTURE WITH A

BIOLOGICALLY INSPIRED DECISION MAKING SYSTEM

A. AD architecture

As stated in section I, the previous control architecturedeveloped by the authors is the AD architecture. This biologi-cally inspired architecture is based on the ideas of the modernpsychology expressed by Shiffrin and Schneider [7], [8], soitconsiders two levels, the automatic and the deliberative levels,as shown in Figure 1.

Fig. 1. AD architecture levels

In AD architectures [56], both levels are formed by skills,which endow the robot with different sensory and motor ca-pacities, and process information. Skills can be coordinated bysequencers and the Main Sequencer manages the deliberativeskills according to a predefined sequence. This sequencer willbe explained later in more detail.

1) Deliberative level:In the natural world, humans delib-erative activities are characterized by the fact that thesearecarried out in a conscious form. Moreover, temporal dimensionis an important property: deliberative processes require alargequantity of time to be dedicated to the analysis. These activitiesare carried out sequentially, that is, one after another, and it isnot possible to carry out more than one deliberative activityat a time.

In our AD architecture implementation, deliberative skillsare based on these activities and the authors consider that onlyone deliberative skill can be activated at once.

2) Automatic level:Living beings’ automatic activities arecharacterized by the fact that their actions and perceptionsare carried out without the necessity of having consciousnessof the processes responsible for controlling those activities.Examples of this would be the heart beat, the hand movementwhen writing, or that of legs when walking. An automaticactivity can be carried out in parallel with other automaticactivities and with a deliberative activity. For example, aperson can be driving a vehicle and maintaining a conversation




simultaneously. The level of complexity of automatic activitiesmay be very variable and goes from the "simplicity" of movinga finger to the complexity of playing a sonata previouslymemorized on the piano.

In the AD implementation, the automatic level [69] ismainly formed by skills which are related with sensors anactuators. Automatic skills can be performed in a parallel wayand they can be merged in order to achieve more complexskills.

3) AD Memories:One of the main characteristics of humanbeings is their ability to acquire and store information from theworld and from their own experiences. Memory can be definedas the capacity to recall past experience or information in thepresent [70].

Based on the memory model proposed by Atkinson andShiffrin [71], the AD architecture considers two differentmemories: the Short-Term Memory and the Long-Term Mem-ory, see Figure 1. In our architecture, Short-Term Memoryis defined as a temporary memory. This memory is regardedas a working memory where temporal information is sharedamong processes and skills. On the other hand, Long-TermMemory is a permanent repository of durable knowledge.This knowledge can come from learning, from processingthe information stored in Short-Term Memory, or it can begiven a priori. In AD architecture this memory refers to apermanent memory where stable information is available onlyfor deliberative skills.

4) The Main Sequencer:The Main Sequencer, as it isshown in Figure 1, is the element in charge of coordinatingdeliberative skills in order for a robot to fulfil a task. TheMain Sequencer performs a sequence of skills that must becarried out by the robot. This sequence is a fixed script whereall possible situations that the robot can face are considered.This means that this script has been programmed in advanceand it is exclusive for certain objectives.

A relevant feature of this architecture is, as already stated,that all possible options must be considered in the sequenceapriory. Depending on the definition of autonomy, this can beconsidered a negative factor since, in bio-inspired systems, thefact that it is the proper agent/robot who must decide its ownobjectives it is assumed. Therefore, since this is our objective,the Main Sequencer has been replaced with a decision makingsystem based on drives, motivations, emotions, and self-learning. This system is described in the next section.

B. Adding the biologically inspired decision making system

As shown in Figure 2, the decision making system has abidirectional communication with the AD architecture. On oneside, the decision making system will select the behavior therobot must execute according to its state. This behavior willbe taken by the AD architecture activating the correspondingskill/s (deliberative or automatic one). On the other side,thedecision making system needs information in order to updatethe internal and external state of the robot.

The general idea of the proposed decision making systemis shown in Figure 3. As explained in section II, the termhomeostasis means maintaining a stable internal state [28].

Fig. 2. AD architecture with the decision making system

This internal state can be configured by several variables,which must be at an ideal level. When the value of thesevariables differs from the ideal one, an error signal occurs:the drive [57].

Fig. 3. The decision making system

In our approach, the autonomous robot has certain needs(drives) and motivations, and following the ideas of Hull [29]and Balkenius [72] [73], the intensities of the motivationsofthe robot are modeled as a function of its drives and someexternal stimuli. For this purpose we used Lorentz’s hydraulicmodel of motivation as an inspiration [74]. In Lorenz’s model,the internal drive strength interacts with the external stimulusstrength. If the drive is low, then a strong stimulus is neededto trigger a motivated behavior. If the drive is high, then amild stimulus is sufficient [28]. The general idea is that weare motivated to eat when we are hungry and also when wehave food in front of us, although we do not really need it.Therefore, the intensities of the motivations are calculated asshown in (1)

If Di < Ld thenMi = 0

If Di ≥ Ld thenMi = Di + wi

(1)

whereMi are the motivations,Di are the related drives,wi

are the related external stimuli, andLd is called the activationlevel.

According to Balkenius [72] [73], all excited motivationalstates can not be allowed to direct the robot at once sincethis would generate incoherent behaviors. In his opinion, thisproblem cannot be handled solely by behavioral competition




but must be resolved at an earlier stage of processing. Thesolution proposed is a motivational competition, as Cañameroalso proposed in [57]. Therefore, in our approach, oncethe intensity of each motivation is calculated, they competeamong themselves for being the dominant one, and this onedetermines the inner state of the robot. It could happen thatifnone of the drives is greater than the activation levelLd then,there is no dominant motivation.

As stated in previous sections, in this decision makingsystem, there are no motivational behaviors. This means thatthe robot does not necessary know in advance which behaviorsto select in order to satisfy the drive related to the dominantmotivation. There is a repertory of behaviors and they canbe executed depending on the relation of the robot with itsenvironment, i.e. the external state. For example, the robot willbe able to interact with people as long as it is accompaniedby someone.

The objective of this decision making system is havingthe robot learn how to behave in order to maintain its needswithin an acceptable range. For this purpose, it uses the Q-learning algorithm to learn from its bad and good experiences.As previously stated, the autonomous robot can learn, fromscratch or using some a priori information about some q-valuesof the state-action pairs, the proper behavior to select in everystate through its interaction with the environment.

Besides, as shown in Figure 3, happiness and sadness areused in the learning process as the reinforcement functionand they are related to the wellbeing of the robot. Next, wejustify this decision but first, let us introduce this concept: thewellbeing of the robot is defined as a function of its drives andit measures the degree of satisfaction of its internal needs.

Wb = Wbideal −∑

i

αi · Di, (2)

whereαi is the set of the personality factors that weight theimportance of each drive on the wellbeing of the robot andWbideal is the ideal value of the wellbeing of the robot. Asobserved, as the values of the needs of the robot increase, itswellbeing decreases.

In order to define happiness and sadness, we took thedefinition of emotion given by Ortony [38] into account. Inhis opinion, emotions occur due to an appraised reaction(positive or negative) to events. According to this point ofview, in [75], Ortony proposes that happiness occurs becausesomething good happens to the agent. On the contrary, sadnessappears when something bad happens. In our system, thiscan be translated into the fact that happiness and sadness arerelated to the positive and negative variations of the wellbeingof the robot.

On the other hand, the role of happiness and sadness as thereinforcement function was inspired by Gadanho’s works, asshown in section IV-B, but also by Rolls [76]. He proposesthat emotions are states elicited by reinforcements (rewardsor punishments), so our actions are oriented to obtainingrewards and avoiding punishments. Following this point ofview, in this proposed decision making system, happiness andsadness are used as the positive and negative reinforcementfunctions during the learning process, respectively. Moreover,

this approach seems consistent with the drive reduction theoryintroduced in section II where, according to this theory, thedrive reduction is the chief mechanism of reward.

In summary, the decision making process is cyclic and itcan be described in the following points:

1. Updating the drives and motivation intensities.2. Motivation competition and selection of the inner state.3. Determining the external state.4. Updating the wellbeing function.5. Generating the reinforcement function (happi-

ness/sadness).6. State-action evaluation (reinforcement learning).7. Behavior selection.

As said at the beginning of this paper, this decision makingsystem has been successfully implemented on autonomousvirtual agents [25], [26], [27]. These agents live in a virtualenvironment created using a text-based game available onlineand called CoffeMud [77]. Next, in this section, we give abrief review of this implementation and the results obtained.

The environment where the agent has to live is a simplerooms-corridor stage. In these rooms, it can find severalobjects, such as food, water, etc., which are needed in orderto satisfy the drives of the agent. Moreover, the agent has alimited set of actions related to every object, for example,“to eat food”, or “to take water”. In this implementation,all the initial q-values are set to zero, therefore, the agentdoes not have any previous information about the behaviorselection. It is important to note again, that the actions arenot related to motivations. This means that the agent does notknow in advance that, for example, it must eat in order tosatisfy its hunger. The drives and motivations implementedare: Hunger, Thirst, Weakness, Loneliness, and Fear. Hunger,Thirst, and Weakness, are related to the consumption of food,water, and medicine respectively. Loneliness is related tosocialinteraction and in order to satisfy it, the agent must interactwith other agents that will be sharing the same environment.Moreover, those agents (opponents) are able to behave badlyor kindly with our agent.

In relation to the emotions, happiness and sadness are thereinforcement function, as previously explained, and the emo-tion fear, based on some theories that state that emotions canmotivate behaviors [14], [19], [78], is defined as a motivation.Therefore, according to our decision making process, fearcould be the dominant motivation and, in that case, the agentwould be “scared”. When this happens, the agent must learnthe right action to execute in order to cope with the situationthat caused this inner state.

The results obtained showed that the agent, using thisdecision making system, is able to learn the right sequenceof actions in order to satisfy its needs by maximizing itswellbeing. This means, for example, that in the case theagent is hungry, it learns that it must go where the food is,then take it, and finally eat it. These results can be seen asobvious, but it was the right selection of the reinforcementfunction which allows the agent to learn properly, withoutany previous knowledge about which action to select at everymoment. Another important result is the one obtained with




Fig. 4. Our social robot Maggie interacting with children

the emotion fear. How the agent is able to generate a “run-away” behavior that was not previously programmed is shownin [27]. Moreover, the agent is able to identify the situationthat scared it. This fact is quite important since most authorshave an emotional releaser, as for example, to be in presenceof an enemy, but in our case the agent, after several trials,learns to identify that dangerous situation.

VI. I MPLEMENTATION ON A SOCIAL ROBOT

In this section, the developed system is presented. First, therobotic platform is briefly introduced. Then, the general ele-ments in the architecture are presented. Later on, the decisionmaking module is shown, explaining how it interacts with thearchitecture. As already stated, the decision making systemimplemented on the robot is a basic version, and currently itis being improved and extended. In this first approach, neitheremotions nor learning have been implemented on the robot.

A. Framework

The presented work has been implemented on the researchrobotic platform named Maggie [79]. Maggie is a social andpersonal robot intended for performing research on human-robot interaction and improving robots autonomy (Figure 4).It was conceived for personal assistance, for entertainment, tohelp handicapped people, to keep people accompanied, etc.Its external friendly look facilitates its social robot task. Bothsoftware and hardware have been developed by the RoboticsLab research group from the University Carlos III of Madrid.

In relation to its hardware, Maggie is a computer-controlledsystem with a wheel base which allows the robot to movethrough the environment. Its arms, neck, and eyelids move-ments show signs of life. The vision system uses a camera inthe head and, thanks to it, Maggie can recognize people and

play several games. Laser telemeter and ultrasound sensorsareused by the navigation system to avoid collisions. By meansof an infrared emitter/receiver, Maggie also operates differenthome appliances such as televisions. Touch sensors on thesurface of the body and a touch screen situated in the breastare used for a direct interaction with people. Inside the head,an RFID antenna is placed for identifying objects. In order toprovide verbal interaction, our robot is equipped with a text-to-speech module and an automatic speech recognition system.

The required energy for all devices is received from twobatteries which provide a power supply of 25 V. During itsworking life, the robot needs at least 20 V. The purpose isto achieve a robot working continuously in a never-endingworking life. This means that the battery should always beover this threshold.

B. AD architecture

Considering the ideas previously stated, the software isbased on the two levels of the Automatic-Deliberative ar-chitecture [22], [56], previously described in section V. Theautomatic level is linked to modules that communicate withhardware, sensors, and motors. At the deliberative level,reasoning processes are placed. As shown in Figure 2, thecommunication between both levels is bidirectional and it iscarried out by the Short-Term Memory and events [80].

Events are the mechanisms used by the architecture forworking in a cooperative way. An event is an asynchronoussignal for coordinating processes by being emitted and cap-tured. The design is accomplished by the implementation ofthe publisher/subscriber design pattern so that an elementthat generates events does not know whether these events arereceived and processed by others or not.

The Short-Term Memory is a memory area which can beaccessed by different processes, where the most importantdata is stored. Different data types can be distributed and areavailable to all elements of the AD architecture. The currentand the previous value, as well as the date of the data capture,are stored. Therefore, when writing new data, the previousdata is not eliminated, it is stored as a previous version. TheShort-Term Memory allows to register and to eliminate datastructures, reading and writing particular data, and severalskills can share the same data. It is based on the blackboardpattern.

On the other hand, the Long-Term memory has been im-plemented as a data base and files which contain informationsuch as data about the world, the skills, and grammars for theautomatic speech recognition module.

As already stated, the essential component in the ADarchitecture is the skill [80] and it is located in both levels. Interms of software engineering, a skill is a class that hide dataand processes that describes the global behavior of a robottask or action. The core of a skill is the control loop whichcould be running (skill is activated) or not (skill is blocked).

Skills can be activated by other skills, by a sequencer, orby the decision making system. They can give data or eventsback to the activating element or other skills interested inthem.Skills are characterized by:




• They have three states: ready (just instantiated), activated(running the control loop), and locked (not running thecontrol loop).

• Three working modes: continuous, periodic, and byevents.

• Each skill is a process. Communication among processesis achieved by Short-Term Memory and events.

• A skill represents one or more tasks or a combination ofseveral skills.

• Each skill has to be subscribed at least to an event andit has to define its behavior when the event arises.

The AD architecture allows the generation of complex skillsfrom atomic skills (indivisible skills). Moreover, a skillcan beused by different complex skills, and this allows the definitionof a flexible architecture.

C. The decision making system

The decision making system proposed in preceding sectionsis intended for achieving a full autonomous robot. Therefore,the decision making module is the one in charge of selectingthe most appropriated skill at each moment for maximizing therobot wellbeing. Choosing the right skill depends on the valueof the motivations, previous experiences, and the relationshipwith the environment. All these elements have been modelledin order to be processed by the implemented decision makingmodule.

The whole process can be summarized in the next steps:1) Selecting the dominant motivation2) Determining the state in the world3) Selecting the feasible skills and executing the best oneIn the following sections, these steps will be explained.All the parameters set in this implementation will shape a

specific personality for the robot. Changing these parameters,new personalities will be exhibited by the robot. The perfor-mance with different personalities will be studied in the future.

1) Which drives and motivations?:As expressed by equa-tion (1), each motivation is represented by an integer valueand it is affected by two factors: internal needs and externalstimuli. Internal needs are the drives and their values dependon inner parameters. External stimuli are the objects situatedin the environment that alter the robot motivations. In addition,each drive has its activation level: below it, motivations valueswill be set to zero and hence, they will not be considered forbeing the dominant motivation.

As mentioned, the internal needs, the drives, represent aninternal value. Each motivation is connected to a drive. Thechoice about which drives (and consequently motivations too)must be implemented, were made at design time. Since thesystem has to be running on a robot intended to interactwith people, somesocial motivation is needed to "push" therobot into human-robot interaction. Moreover, the authorswantthe robot to be endowed with play-oriented aspects, hence, arecreationalnature is required by the robot. Nevertheless, thefirst primitive drive for all entities is tosurvive and, in ourcase, it is translated to the need of energy.

Therefore, the selected drives are:• loneliness: the need of companion.

0

Val

ue

Time

lonelinessboredomenergy

Fig. 5. Temporal evolution for all drives.

• boredom: the need of "fun" or entertainment.• energy: this drive is necessary for survival.Based on the previous drives, the following non-

conventional motivations have been defined:• social: it means the need of interaction with a human and

its drive is loneliness.• recreational: this motivation is related to entertainment

purposes. Its associated drive isboredom.• survival: it refers to the energy dependence. This motiva-

tion is connected to theenergyneed.All drives, and consequently motivations too, temporally

evolve from their initial values. In our implementation (Figure5), each drive can have asatisfaction time. This represents theperiod of time the drive remains at its initial value after ithas been satisfied (look at the beginning oflonelinessdrive inFigure 5). During this time the drive does not evolve. Afterthe satisfaction time,lonelinessand boredomdrives linearlyincrease but with different parameters. It means that, as timegoes by, these drives become bigger and bigger, and so do thecorresponding motivations.Lonelinessis the fastest drive andboredomevolves slighter. This is because in social robots, asours, interaction with people is one of the most relevant aims.Hence satisfaction time is very short and it is likely thatsocialmotivation will become the dominant motivation.

The energydrive is significantly different. This is the mostrelevant inner need due to the implicit necessity ofsurvival. Ifwe want to achieve a fully autonomous robot, power autonomyis the first step. Therefore, it will keep its initial value until alow battery level is detected. Then, at this point, its valuewillsuffer a drastic raise.

In order to avoid an unstopped increase in the value ofone of the motivations, a saturation level is defined for eachone: once a motivation has reached its saturation value, it willnot grow more. Different motivations have different saturationvalues which will determine the priority of the dominant moti-vation in case of a never-ending expansion of the motivations.In our implementation,survival is the first one, and thesocialand recreationalmotivations go after.

2) Sensing the world:The world is perceived by therobot in terms of objects and the states in relation to theseobjects (the external state). As a first approach, the world




Fig. 6. States and actions for items: (a) TV, (b) person, and (c) dockingstation. The dashed arrows represent the skills that monitor the states, thecontinuous ones mean the actions executed with the objects,and the circlesare the states related to each item.

where Maggie is living in is limited to the laboratory. In thisenvironment three objects have been defined: the people livingaround the robot, a television/radio appliance, and the dockingstation for supplying energy.

In Figure 6, the states related to each object, the actions, andthe transitions from one state to another are shown. Dashedarrows represent the skills that monitor the states, continuousones mean the actions executed with the objects, and thecircles are the states related to each item. If an action doesnotappear at one state, it means that it is incoherent to executeitfrom that state, e.g., Maggie cannotplay musicif it is far fromthe TV or it cannotfollow a person if it isalone. Figures 6.a,6.b, and 6.c representTV, person, anddocking stationobjects,and their states and actions, respectively.

Television objectSince Maggie can control the TV appliance by means of an

infrared interface [81], the robot must be placed at a certaindistance and facing the appliance in order to be able to operateit. Therefore, two states related to thetelevision item aredefined:closeandfar, which symbolize the position where therobot is able to command the TV or not, respectively (Figure6.a). In this case, these states are monitored by a skill thatreads information from the navigation system which knowswhere the TV position is. From thefar state, no other actionthan getting closer to the TV is possible; then, it is able tooperate the appliance.

Person objectFor the person item, three states are determined:alone,

accompanied, andcommanded(Figure 6.b).Alone representswhen no people are around the robot, and this is the initialstate with respect to people;accompaniedmeans that someoneis around Maggie but no direct interaction exists; and, finally,commandedstate corresponds to a direct interaction betweena person and Maggie where the user asks the robot to dosomething. The transitions from one state to another aredetected by two skills integrated in the control architecture:a face detection skill and a speech recognition skill. The facedetection skill notifies if the robot is alone or accompaniedby searching for faces. The speech recognition skill listensto any word through a microphone and it distinguishes if thedialogue is directed to Maggie or not, determining if the robotis accompaniedor commanded. If no words are received andno faces are found for a long time, the robot changes its staterelated topersonto alone. In the experiments, this time is setto 10 seconds.

It needs to be mentioned that actions atcommandedstate,obey and disobey, are a bit special. These functions aredesigned to execute what a person has ordered to do or to rejectthe user’s command. This implies that Maggie can disobey ainstruction from the user and the robot will inform the userabout it, reason why a direct dialogue between Maggie and theuser is needed. Therefore, thecommandedstate is required forthese actions.

Docking station objectIn relation to the docking station, Maggie has two states:

charging and discharging. If Maggie is connected to thestation, then the battery level is increasing, so it ischarging.Otherwise, if the robot is unplugged, it isdischargingand thebattery level decreases. This information is read by the batterysensor skill. When the robot isdischarging, it can justgo tocharger. After that, it is plugged in and, once in thechargingstate, just one action is possible: tochargethe batteries. If askill that moves the robot around is selected when the robotis charging, it will leave the docking station anddischargingwill be the state.

External stimuliJust like human beings can feel thirst when they see water,

the motivations can be influenced by some objects present inthe environment. These are called the external stimuli or in-centives. These stimuli may have more or less influence: theirvalues depend on the states related to the objects (this means,if they are near or far from the robot). In our implementation,all external stimuli values have been fixed empirically, andthese values are shown in table I.

Due to the fact that Maggie is a very friendly robot and itloves people, thesocial motivation is affected when a personis near the robot, i.e., when the state of Maggie with respecttothepersonitem isaccompaniedor commanded. Therefore, thetotal social motivation value will be increased by five units.Since our robot likes playing with people and dancing whilelistening to music from TV, therecreational motivation isincreased by 10 units when people are around the robot orwhen it is close to the TV. Up to now, thesurvivalmotivationdoes not have any external stimuli, but in the future, when the




TABLE ITHIS TABLE SHOWS EXTERNAL STIMULI, OBJECTS STATES LINKED TO

THEM, THEIR VALUE , AND THE AFFECTED MOTIVATIONS

Motivation Ext. stim. State related to ext. stim. Value

social person accompanied 5

commanded 5

recreational person accompanied 10

TV near 10

survival - - -

docking station is seen by the robot, it could feel the need ofenergy.

3) Acting in the world: Maggie interacts with the worldthrough the objects and their potential actions. These actionsare implemented as skills in the AD architecture. The possibleactions with thepersonitem are:

• following a person: Maggie will move following theclosest person to it.

• self-introduction: the robot will introduce itself informingabout its history and abilities.

• playing with: our robot will play several games with theuser, such as tic-tac-toe, hangman, and animal-trivial [82].

• obey: Maggie will comply with the user request. The usercan ask the robot to execute one of the previous actions.

• disobey: Maggie will deny the user request and it willinform the user about it.

About the TV appliance, its actions are:

• getting closer to: the robot moves towards TV.• play music: Maggie turns on the TV and changes to a

music channel.• stop music: music is stopped and the TV is switched off.

In relation to the docking station, the possible actions are:

• go to: the robot plugs itself in the station.• charge: Maggie keeps connected until batteries are full.

The actions cause effects over the drives. When the actionshave ended, i.e., when the associated skill has been blockedbecause it has reached its goal, the effects are applied. If anerror occurs during a skill execution, or it is not successful,this situation is notified and its effect is not applied. In theexperiments presented in the last section, most of the effectsaffect one or more drives, which become zero, decrease orincrease their value.

All effects are presented in Table II. These effects havebeen defined by the designer and any other values couldhave been selected. As it is shown, thelonelinessdrive issatisfied after thefollow and play with actions are executedbecause both actions suggest a bidirectional interaction be-tween the robot and a person. However,self-introductioncouldbe accomplished without any response from the person so itseffect decreases thelonelinessdrive by 10 units. Our robot’shobbies are playing and listening music, therefore,play withand play music satisfy the boredomdrive. Moreoverstopmusic increasesboredomby five units. Other actions wherea person is required (follow and self-introduction) affect in alower factor (minus five and minus one respectively). Besideslonelinessis influenced byobeyand disobeyas well. When

TABLE IIACTIONS EFFECTS

Action Object Effect Drive

follow person set to 0 loneliness

-5 boredom

self-introduction person -10 loneliness

-1 boredom

play with person set to 0 loneliness

set to 0 boredom

Obey person -15 loneliness

Disobey person +15 loneliness

Get closer to TV

Play music TV set to 0 boredom

Stop music TV +5 boredom

Go to docking station

Charge docking station set to 0 energy

Leave docking station

the robot complies with a user’s requestlonelinessis reducedby fifteen units. On the other hand,disobeyingan instructionrises thelonelinessby fifteen units. At last, theenergydriveis satisfied when the batteries are recharged and this happenswhen the robot is atchargestate.

4) What does Maggie do now?:Once the world has beenpresented, how the decision making system operates will beexplained. First of all, when the system starts, the drivesbegin to evolve independently from their initial value, andthe skills start monitoring the states related to items. When anew state transition is detected, a specific event is emittedandthe states are written in the Short-Term Memory. The decisionmaking module receives this event and the data of the statesis updated. Within robot lifetime, the action selection loop isexecuted in order to determine the next skill to be activated.At each iteration, the dominant motivation is computed asthe maximum motivation whose value (internal needs plusexternal stimulus) is over the activation level. This parameterhas been fixed to10 for every motivation. Using the dominantmotivation, the current states related to objects, and the learntvalues, the next action will be chosen.

As briefly described in section V, this approach has alreadybeen implemented on virtual agents. During these simulationsand using the Q-learning reinforcement learning algorithm[24], the agent learnt the right q-values for maximizing itswellbeing. Taking those values as an inspiration, for the currentimplementation on Maggie, we propose a set of initial q-values that represent the best possible actions at each worldconfiguration (the dominant motivation plus the state relatedto each object). The tuple formed by the dominant motivation,the object, the state related to the object, and the action withthe highest q-value, will decide the selected action. In futureimplementations, the learning process will be carried out on-line by the robot itself and the initial q-values will be updatedthrough the learning process.

As already stated, the available actions at each state dependon the state itself. Hence, each object-state pair will be associ-ated to different actions. For example, forplaying withpeopleMaggie has to beaccompaniedwith a person. Therefore, the




play with action can not be activated when the robot isalone.At this point, these combinations do not exist and therefore,they will never be selected for execution.

VII. E XPERIMENTS

In this section, some preliminary experiments are presentedshowing the performance of our system.

A. Policy of behavior: action selection

As previously mentioned, in this first implementation, therobot will use a predefined set of initial q-values for maintain-ing its internal needs within a determined range. In the realworld, each action is connected to a skill. Therefore, dependingon the configuration of the world, Maggie will execute oneaction or skill: it chooses the right action from all possibleones according to asoftmaxaction selection rule [83]. TheSoftmax algorithm endows the decision making system witha certain randomness: the bigger the q-value for an action, themore likely this action will be selected. Again, in the future,when learning is done on-line, the q-values will change duringthe working life.

In Table III all the initial q-values are presented. Thesevalues represent the value for each action at each state. Theactions in the table are the possible actions according to theobjects states. Depending on the states, some actions have notbeen shown because they are not feasible:follow a person isnot possible if the robot is alone, or music cannot be playedif it is far from TV. For that reason, they will not be chosen.

If social is the dominant motivation, all actions related tothe person item have high values. Actions connected toTVwill be executed just when robot isalone, so no actions withpersons are possible.

Focussing on therecreationalmotivation, “fun” skills willbe likely executed whether a person is nearby the robot orMaggie isnear TV. Fun skills have been defined by the authorsand they are:follow, self-introduction, play with, and playmusic.

Concerning obedience,obeywill probably be executed justin the commandedstate. In other cases, all possible actionshave the same probability of being run.

According to the good sense, if the dominant motivation issurvival, actions concerning the docking station item will beprobably selected. So, if the robot isdischarging, it will goto the docking station and afterwards, when it ischarging, itwill be plugged in until its batteries are charged, i.e.,chargeskill.

The robot also must consider what to do when all its needsare satisfied. In our case, when a dominant motivation does notcome up, depending on the state, the most reasonable skillswill be to chargeits batteries or togo to the docking station.

B. Evolution on Maggie

This experiment presents an example of how the motivationvalues change with time during Maggie’s lifetime, see Figure7. Most of motivations grow uniformly but, sometimes, jumpsappear. These jumps are because of the presence of external

TABLE IIIPROPOSED Q-VALUES

DominantObject States Actions Initial

motivation Q-value

social

person

alone - -

accompaniedfollow 10

self-introduction 10

play with 10

commandedobey 10

disobey 10

TVnear

play music 5

stop music 5far get closer 5

Docking st.charging charge 5

discharging go to 5

recreational

person

alone - -

accompaniedfollow 10

self-introduction 10

play with 10

commandedobey 5

disobey 5

TVnear

play music 10

stop music 5

far get closer 10


discharging go to 5

survival

person

alone - -

accompanied

follow 1

self-introduction 1

play with 1

commandedobey 1

disobey 1

TVnear

play music 1

stop music 1

far get closer 1


discharging go to 10

none

person

alone - -

accompanied

follow 0

self-introduction 0

play with 0

commandedobey 0

disobey 0

TVnear

play music 0

stop music 0

far get closer 0


discharging go to 5

stimuli as well as due to the effects of the actions on the drives(the numbers located on top of Figure 7 represent the executedaction).

For example, focusing on therecreationalmotivation, wecan observe a little increase at the beginning of the lifetime.This is because Maggie changed its state tonear the TV. Thisis an external stimulus of this motivation and therecreationalvalue is increased by ten units.

During the execution of action number four,social andrecreationalmotivations raise at the same time. This is becausethe robot has detected the presence of some people (accom-paniedstate).

Looking at the middle, thesocial and recreationalmotiva-tions jump down quickly. At this point, Maggie wasaccompa-




niedby a person and the executed action wasplay. The effectsof this action are to set thelonelinessand boredomdrivesto zero, and hence thesocial and recreational motivationsfall, respectively. These motivations are not zero at this pointbecause of the external state of the robot:accompaniedstateadds five points to social motivation and ten to recreationalone. When the robot changes to thecommandedstate theexternal stimulus for these motivations disappear but a newequivalent one appears for social motivation so no jump comesup in this motivation.

Besides, we would like to mention the final part of thegraph. Here, how the transition to thefar state affects justthe recreational motivation because this makes its externalstimulus disappear is shown. Afterwards, the effect offollowaction is pointed: it reduces therecreationalmotivation andsatisfies thelonelinessdrive.

As shown, an action or state transition can affect severalmotivations, which means that effects are not attached to justone.

According to our initial goals, it is easy to appreciate thatsocial is the fastest motivation.

At the multicolored band indicating the dominant motivation(upper Figure 7), a short black band stands out several times.This is the period of time when all motivations are satisfied:all drives are below their activation limits, and there is nodominant motivation. At this time, thego toaction is executedbecause it is the most likely.

Finally, in relation to thesurvivalmotivation, since the mostpart of this experiment was executed with full batteries, thismotivation is stable and very low almost all the time. But atthe end, the battery level exceeds the limit and the survivalmotivation becomes the dominant one.

According to the initial q-values, Maggie knows that thebest possible actions arego to and charge. Since Maggie ischargingat that moment, the most probable action ischarge.Finally, this is selected and executed.

Moreover, satisfaction time forsocial motivation is pointedat the end of the graph.

VIII. C ONCLUSIONS ANDFUTURE WORKS

In the last years, due to the increasing interest on socialrobots, cognitive systems have served as an inspiration fora new design of control architectures. In this work, wehave presented a biologically inspired control architecture, inwhich the main decisions are made based on motivations andemotions. This architecture is an evolution from a previousone, the AD architecture, where the current goal of the robotwas decided by an external operator, and some predefinedsequences coordinated the robot behavior.

In the presented work, the new control architecture has beenendowed with a decision making system based on biologicallyinspired concepts such as drives, motivation, emotions, andself-learning. Those concepts are included in order to improvethe autonomy of the robot, as well as to try to imitate a living-being behavior. Up to now, the complete decision makingsystem has been successfully tested on virtual agents.

This paper shows the implementation on a real roboticplatform of a basic version of this architecture, where neither

0

50

100

150

200

250

300

350

400

5min 10min

Val

ue

Time

(1) (2) (3) (4) (5) (1) (6) (7) (6) (7) (2) (8) (5) (1) (9)

near

accompanied

effect ofplay with

commanded

far

effect offollow

satisfactiontime

socialrecreational

survival

Fig. 7. Temporal evolution of motivations. Numbers on top represent theexecuted actions: (1)go to the docking station, (2)get closer to tv, (3)playmusic on tv, (4)stop music on tv, (5)play with a person, (6)disobey, (7)obey,(8)follow and (9)charge. The vertical white-grey bands at the backgroundcorrespond to the execution time of each action. The upper colored bandindicates the dominant motivation. Some action effects andchanges of statesare pointed.

emotions nor self-learning are included in the decision makingsystem. The experiments made on Maggie, a social robotdesigned for the interaction with humans, show that the robotis able to select the most appropriate skill autonomously, basedon its own drives and motivations.

The next step of our work is to implement the learningprocess on Maggie. Therefore, the skill selection will be learntby the robot through its interaction with the environment.Besides, the initial q-values could be set all to zero or it wouldbe useful to give some prior knowledge. The last could beuseful to improve the learning time based on some inheritanceknowledge as living beings have.

Moreover, the learning process will take the role of someemotions as the reinforcement function (happiness, sadness)into account. On the other hand, fear will be implemented asanother motivation for the robot. Nevertheless, it is expectedthat, due to the complexity of a real environment, the definitionof new emotions, or re-definition of the old ones, will beneeded.

ACKNOWLEDGMENT

The authors gratefully acknowledge the funds provided bythe Spanish Government through the project called “Peer toPeer Robot-Human Interaction” (R2H), of MEC (Ministry ofScience and Education), and the project “A new approach tosocial robotics” (AROS), of MICINN (Ministry of Science andInnovation).

REFERENCES

[1] F. Littmann and J. Riviere, “A remote-operated system for interventionson explosives,” inProceedings of the ANS Seventh Topical Meeting onRobotics and Remote Systems, vol. 2, 1997, pp. 1038–42.

[2] A. Casals, R. Merchan, E. Portell, X. Cuf, and J. Contijoch, “Capdi: arobotized kitchen for the disabled and elderly,” inProceedings of theAssistive Technology an the Threshold of the New Millennium. AAATE99, Dusseldorf, Germany, 1999, pp. 346–351.




[3] K. Schilling, M. Mellado, J. Garbajosa, and R. Mayerhofer, “Design offlexible autonomous transport robots for industrial production,” in Pro-ceedings of the IEEE International Symposium on IndustrialElectronics,vol. 3, 1997, p. ISIE ’97.

[4] N. Kubota, Y. Nojima, N. Baba, F. Kojima, and T. Fukuda, “Evolvingpet robot with emotional model.” Proceedings of the 2000 Congresson Evolutionary computation, 2000.

[5] W. Maier and E. Steinbach, “A probabilistic appearance representationand its application to surprise detection in cognitive robots,” AutonomousMental Development, IEEE Transactions on, vol. 2, no. 4, pp. 267 –281,December 2010.

[6] Y. Zhang and J. Weng, “Spatio-temporal multimodal developmentallearning,” Autonomous Mental Development, IEEE Transactions on,vol. 2, no. 3, pp. 149 –166, September 2010.

[7] R. M. Shiffrin, “Atention,” Stevens’ Handbook of Experimental Psychol-ogy. Second Edition., vol. 2, 1988.

[8] R. M. Shiffrin and W. Schneider, “Controlled and automatic humaninformation processing: In perceptual learning, automatic attending anda general theory,”Psicological Review, pp. 127–190, 1997.

[9] J. LeDoux,El cerebro emocional. Ariel/Planeta, 1996.[10] A. Damasio,Descartes’ Error - Emotion, reason and human brain.

Picador, London, 1994.[11] S. C. Lewis, “Computational models of emotion and affect,” Ph.D.

dissertation, University of Hull, 2004.[12] S. Gadanho, “Reinforcement learning in autonomous robots: An empir-

ical investigation of the role of emotions,” Ph.D. dissertation, Universityof Edinburgh, 1999.

[13] R. W. Picard,Los ordenadores emocionales. Ed. Ariel S.A., 1998.[14] E. Rolls, Emotion Explained. Oxford University Press, 2005.[15] R. C. Arkin, Who needs emotions? The brain meets the robots. Oxford

University Press, 2004, ch. Moving up the food chain: Motivation andEmotion in behavior-based robots.

[16] K. L. Bellman,Emotions in Humans and Artifacts. MIT Press, 2003, ch.Emotions: Meaningful mappings between the individual and its world.

[17] L. Caï¿12

amero,Emotions in Humans and Artifacts. MIT Press, 2003,ch. Designing emotions for activity selection in autonomous agents.

[18] T. Ziemke and R. Lowe, “On the role of emotion in embodiedcognitivearchitectures: From organisms to robots,”Cognitive Computation, vol. 1,no. 1, pp. 104–117, 2009.

[19] C. Breazeal,Designing Sociable Robots. The MIT Press, 2002.[20] R. C. Arkin, “Homeostatic control for a mobile robot: Dynamic replan-

ning in hazardous environments,” inSPIE Conference on Mobile Robots,Cambrige, MAA, 1988.

[21] T. Ziemke, “On the role of emotion in biological and robotic autonomy,”Biosystems, vol. 91, no. 2, pp. 401–408, 2008.

[22] R. Barber, “Desarrollo de una arquitectura para robotsmoviles au-tonomos. aplicacion a un sistema de navegacion topologica.” Ph.D.dissertation, Universidad Carlos III de Madrid, 2000.

[23] M. Malfaz, “Decision making system for autonomous social agentsbased on emotions and self-learning.” Ph.D. dissertation,Carlos IIIUniversity of Madrid, 2007.

[24] C. J. Watkins, “Models of delayed reinforcement learning,” Ph.D.dissertation, Cambridge University, Cambridge, UK, 1989.

[25] M. Malfaz and M. Salichs, “Learning behaviour-selection algorithmsfor autonomous social agents living in a role-playing game,” in Proc-cedings of the AISB’06: Adaptation in Artificial and Biological Systems.University of Bristol, Bristol, England, April 2006.

[26] ——, “Emotion-based learning of intrinsically motivated autonomousagents living in a social world,” inProccedings of the ICDL 5: The FifthInternational Conference on Development and Learning, Bloomington,Indiana, June 2006.

[27] ——, “The use of emotions in an autonomous agent’s decision makingprocess.” inNinth International Conference on Epigenetic Robotics:Modeling Cognitive Development in Robotic Systems (EpiRob09).Venice. Italy, 2009.

[28] K. C. Berridge, “Motivation concepts in behavioural neuroscience,”Physiology and Behaviour, no. 81, pp. 179–209, 2004.

[29] C. L. Hull, Principles of Behavior. New York: Appleton Century Crofts,1943.

[30] J. Santa-Cruz, J. M. Tobal, A. C. Vindel, and E. G. Fernï¿1

2ndez,

“Introduccin a la psicologa,” 1989, facultad de Psicologa.UniversidadComplutense de Madrid.

[31] A. Ortony, D. A. Norman, and W. Revelle,J.M. Fellous and M.A. Arbib,Who needs emotions: The brain meets the machine, 2005, ch. Affect andproto-affect in effective functioning.

[32] J. Fellows, “From human emotions to robot emotions,” AAAI 2004Spring Symposium on Architectures for Modelling Emotion: Cross-Disciplinary Foundations.SS-04-02. AAAI Press., Tech. Rep., 2004.

[33] A. Kelley, Who Needs Emotions? The Brain Meets the Robot. OxfordUniversity Press, 2005, ch. Neurochemical networks encoding emotionand motivation: an evolutionary perspective.

[34] G. Hollinger, Y. Georgiev, A. Manfredi, B. Maxwell, Z. Pezzementi,and B. Mitchell, “Design of a social mobile robot using emotion-based decision mechanisms,” inIntelligent Robots and Systems, 2006IEEE/RSJ International Conference on, 2007, pp. 3093–3098.

[35] R. S. Lazarus,Appraisal processes in emotion: Theory, methods, re-search. New York: Oxford University Press, 2001, ch. Relationalmeaning and discrete emotions, pp. 37–67.

[36] J. S. de Freitas and J. Queiroz,Advances in Artificial Life, 2007, ch.Artificial Emotions: Are We Ready for Them?, pp. 223–232.

[37] S. Gadanho, “Learning behavior-selection by emotionsand cognition ina multi-goal robot task,”The Journal of Machine Learning Research.MIT Press Cambridge, MA, USA, no. 4, pp. 385–412, 2003.

[38] A. Ortony, G. L. Clore, and A. Collins,The Cognitive Structure ofEmotions. Cambridge University Press. Cambridge, UK, 1988.

[39] R. Plutchik,Emotion: a psycho evolutionary synthesis. Harper & Row,New York, 1980.

[40] M. Scheutz, “Useful roles of emotions in artificial agents: a case studyfrom artificial life.” in AAAI 2004. AAAI press, Menlo Park, 2004, pp.42–48.

[41] C. Delgado-Mata and R. Aylett, “Emotion and action selection: Regu-lating the collective behaviour of agents in virtual environments,” in3rdInternational Joint Conference on Autonomous Agents and MultiagentSystems, 2004.

[42] D. Caï¿12

amero, “Emotions and adaptation in autonomous agents: adesign perspective,”Cybernetics and systems: International Journal,vol. 32, pp. 507–529, 2001.

[43] J. S. Albus, C. R. McLean, A. J. Barbera, and M. L. Fizgerald,“Hierarchical control for robots and teleoperatrors,” inProceedings ofthe Workshop on Intelligent Control, 1985.

[44] G. Saridis, “Control prfomance as an entropy: an integrated theory forinteligent machines,” Robotics and Automation Laboratory, New York.,Tech. Rep., 1983.

[45] R. A. Brooks, “A robust layered control system for a mobile robot,”IEEE Journal of Robotics and Automation, pp. 14–24, April 1986.

[46] ——, “Intelligence without reason,”Artificial Intelligence, pp. 139–159,1991.

[47] M. Mataric, “Behaviour-based architectures for inteligent control,” inWorkshop on Inteligent Autonomours Control System, November 1992.

[48] J. R. Firby, “Building symbolic primitives with continuous controlroutines,” in First International Conference on AI Planning Systems.College Park, M.D., June 1992.

[49] ——, “Task networks for controlling continuous processes,” inSensorialInternational Conference on AI Planning Systems., E. K. Hammond., Ed.AAAI Press, 1994, pp. 49–54.

[50] R. P. Bonasso, J. Firby, E. Gat, D. Kortenkamp, D. P. Miller, and M. G.Slack, “Experiences with an architecture for inteligence,reactive agents,”Journal of Experimental Theory of Artificial Intelligence., vol. 9, pp.237–256, 1997.

[51] D. Schreckenghost, P. Bonasso, D. Kortenkamp, and D. Ryan, “Threetier architecture for controlling space life support systems,” in IEEESymposium on Intelligence in Automation and Robotics, May 1998, pp.195–201.

[52] E. Gat, “Alfa: A language for programming reactive robotic controlsystems,” in IEEE Conference on Robotics and Automation. LosAlamitos, CA.: IEEE Computer Society Press., 1991.

[53] ——, “Integrating planning and reacting in a heterogeneous asyn-chronous architecture for controlling real-world mobile robots,” inTenthNational Conference on Artificial Intelligence.AAAI. San Jose, CA.,July 1992.

[54] ——, “Esl: A language for suporting robust plan execution in embeddedautonomous agents,” inProceedings of the IEEE Aerospace Conference.Los Alamitos, CA.: IEEE Computer Society Press., 1997.

[55] R. Alami, R. Chatila, S. Fleury, M. Ghallab, and F. Ingrand, “An archi-tecture for autonomy,”The International Journal of Robotics Research,vol. 17, no. 4, pp. 315–337, 1998.

[56] M. A. Salichs and R. Barber, “A new human based architecture forintelligent autonomous robots.” in4th IFAC Symposium on IntelligentAutonomous Vehicles, 2001, pp. 85–90.

[57] L. Caï¿12

amero, “Modeling motivations and emotions as a basis forintelligent behavior,” inFirst International Symposium on AutonomousAgents (Agents’97), 148-155. New York, NY: The ACM Press., 1997.




[58] ——, “Designing emotions for activity selection,” Dept. of ComputerScience Technical Report DAIMI PB 545, University of Aarhus, Den-mark, Tech. Rep., 2000.

[59] S. Gadanho and J. Hallam, “Emotion- triggered learningin autonomousrobot control,”Cybernetics and Systems, vol. 32(5), pp. 531–59, 2001.

[60] S. Gadanho and L. Custodio, “Asynchronous learning by emotions andcognition,” in From Animals to Animats VII, Proceedings of the SeventhInternational Conference on Simulation of Adaptive Behavior (SAB’02),Edinburgh, UK, 2002.

[61] J. Velï¿12

squez, “Modeling emotions and other motivations in syntheticagents,” inFourteenth National Conf. Artificial Intelligence, 1997.

[62] ——, “Modelling emotion-based decision-making,” in1998 AAAI FallSymposium Emotional and Intelligent: The Tangled Knot of Cognition,1998.

[63] C. Breazeal and L. Aryananda, “Recognition of affective communicativeintent in robot- directed speech,”Autonomous Robots, vol. 12, pp. 83–104, 2002.

[64] C. Breazeal and R. Brooks,Who Needs Emotions: The Brain Meets theRobot. MIT Press, 2004, ch. Robot Emotion: A Functional Perspective.

[65] C. Breazeal, D. Buchsbaum, J. Gray, D. Gatenby, and B. Blumberg,“Learning from and about others: Towards using imitation tobootstrapthe social understanding of others by robots,”Artificial Life, vol. 11, pp.1–32, 2005.

[66] E. D. Sevin and D. Thalmann, “A motivational model of action selectionfor virtual humans,” inProceedings of Computer Graphics International,2005, pp. 213– 220.

[67] J. Hirth, T. Braun, and K. Berns,KI 2007: Advances in ArtificialIntelligence, 2007, ch. Emotion Based Control Architecture for RoboticsApplications, pp. 464–467.

[68] C. L. Lisetti and A. Marpaung,KI 2006: Advances in Artificial Intelli-gence, 2007, ch. Affective Cognitive Modeling for Autonomous AgentsBased on Scherer’s Emotion Theory, pp. 19–32.

[69] M. J. L. Boada, R. Barber, and M. A. Salichs, “Visual approach skillfor a mobile robot using learning and fusion of simple skills,” Roboticsand Autonomous Systems, vol. 38, pp. 157–70, March 2002.

[70] E. N. Zalta, Stanford Encyclopedia of Philosophy,http://plato.stanford.edu/entries/memory/, First published Tue Mar11, 2003; substantive revision Wed Feb 3, 2010.

[71] R. C. Atkinson and R. M. Shiffrin,The Psychology of Learning andMotivation. K. W. Spence and J. T. Spence. New York: AcademicPress, 1968, vol. 2, ch. Human Memory: A Proposed System and ItsControl Processes, pp. 89–195.

[72] C. Balkenius, “Motivation and attention in an autonomous agent,” inWorkshop on Architectures Underlying Motvation and Emotion WAUME93, University of Birmingham, 1993.

[73] ——, “Natural intelligence in artificial creatures,” Ph.D. dissertation,Lund University Cognitive Studies 37, 1995.

[74] K. Lorenz and P. Leyhausen,Motivation of human and animal be-haviour; an ethological view. New York: Van Nostrand-Reinhold, 1973,vol. xix.

[75] A. Ortony, Emotions in Humans and Artifacts. MIT Press, 2003, ch.On making Believable Emotional Agents Believable, pp. 188–211.

[76] E. Rolls, Emotions in Humans and Artifacts. MIT Press, 2003, ch.Theory of emotion, its functions, and its adaptive value.

[77] B. Zimmerman, “http://www.coffeemud.org/,” 2007.[78] L. Caï¿1

2amero, “Emotion understanding from the perspective of au-

tonomous robots research,”Neural Networks, vol. 18, pp. 445–455,2005.

[79] M. A. Salichs, R.Barber, A. M.Khamis, M.Malfaz, J. F.Gorostiza,R.Pacheco, R.Rivas, A.Corrales, and E.Delgado, “Maggie: Aroboticplatform for human-robot social interaction,” inIEEE InternationalConference on Robotics, Automation and Mechatronics (RAM 2006).Bangkok. Thailand, 2006.

[80] R. Rivas, A. Corrales, R. Barber, and M. A. Salichs, “Robot skillabstraction for ad architecture,” in6th IFAC Symposium on IntelligentAutonomous Vehicles, 2007.

[81] J. Salichs, A. Castro-Gonzalez, and M. A. Salichs, “Infrared remotecontrol with a social robot,” inFIRA RoboWorld Congress 2009,Springer, Ed. Incheon, Korea.: Springer, August 2009.

[82] ï. C.-G. A. A. R. M. Y. F. Alonso-Martï¿12

n, V. Gonzï¿12

lez-Pachecoand M. A. Salichs, “Using a social robot as a gaming platform,”in International Conference on Social Robotics, Singapore, November2010.

[83] R. S. Sutton and A. G. Barto,Reinforcement Learning: An Introduction.MIT Press, Cambridge, MA, A Bradford Book, 1998.

Dr. María Malfaz Dr. María Malfaz received herdegree in Physics Science at La Laguna Universityin 1999. In October 2001, she became a MSc inControl Systems at Imperial College of Londonand since then she is an Assistant Professor at theDepartment of System Engineering and Automationof the University Carlos III of Madrid. She receivedthe Ph.D. degree in Industrial Engineering in 2007and the topic was "Decision Making System forAutonomous Social Agents Based on Emotions andSelf-learning". Her current research area follows the

line carried out in her thesis.

Álvaro Castro-González Álvaro Castro Gonzálezobtained his degree as Computer Engineer from theUniversity of León, Spain, in 2005. After he becamea MSc in Robotics and Automation in 2008 at theUniversity Carlos III of Madrid, Spain, where heis currently a Ph.D. candidate and a member ofthe RoboticsLab research group. Álvaro has beena teaching assistant for several classes at the De-partment of Systems Engineering and Automationof the Carlos III University of Madrid. His researchinterests include personal and social robots, human-

robot interaction, decision making systems, emotions and motivations onrobots.

Dr. Ramon Barber Dr. Ramon Barber is AssociateProfessor of the System Engineering and Automa-tion Department, at the University Carlos III ofMadrid, Spain. He received the B.Sc. degree inIndustrial Engineering from Polytechnic Universityof Madrid, in 1994, and the Ph. D. degree inIndustrial Technologies from the University CarlosIII. In 2000 has developed a new control architecturefor mobile robots based on topological navigation,continuing his research in these topics. He is memberof the International Federation of Automatic Control

(IFAC).

Prof. Dr. Miguel Angel Salichs Prof. Dr. MiguelAngel Salichs is a full professor of the Systems En-gineering and Automation Department at the CarlosIII University of Madrid. He received the ElectricalEngineering and Ph.D. degrees from PolytechnicUniversity of Madrid. His research interests includeautonomous social robots, multimodal human-robotinteraction, mind models and cognitive architectures.He was member of the Policy Committee of the In-ternational Federation of Automatic Control (IFAC),chairman of the Technical Committee on Intelligent

Autonomous Vehicles of IFAC, the responsible of the SpanishNationalResearch Program on Industrial Design and Production, the President ofthe Spanish Society on Automation and Control (CEA) and the Spanishrepresentative at the European Robotics Research Network (EURON).

Documents

A biologically inspired architecture for an autonomous and social robot