IOS Press Talking Agents: A distributed architecture for ... · ORACULOS, and the results have been the basis for considering the evolution and application of the Talking Agent architecture

Integrated Computer-Aided Engineering 17 (2010) 243–259 243DOI 10.3233/ICA-2010-0341IOS Press

Talking Agents: A distributed architecture forinteractive artistic installations

Jose M. Fernandez and Juan Pavon∗Department Ingenierıa del Software e Inteligencia Artificial, Facultad de Informatica, Universidad Complutense deMadrid, Madrid, Spain

Abstract. Recent advances in Artificial Intelligence imply challenges and opportunities to explore new kinds of artistic experiencesin interaction with the spectator of an artistic installation. In this context, a wide range of elements, such as sensors and speechrecognition and synthesis, has been considered to establish an intelligent environment with an artistic purpose. The coordinationof these elements to create an interactive environment where the spectator may feel faced with some kind of human-like behavior(the use of speech contributes to this goal) requires the integration of different Artificial Intelligence techniques. The definitionof Talking Agents as reusable components for managing coordination of this diversity of elements and the interaction with thespectator facilitates building and setting different artistic scenarios. This paper describes the architecture of Talking Agentsand their implementation in a multi-agent framework. This has been used in an experiment with an artistic installation calledORACULOS, and the results have been the basis for considering the evolution and application of the Talking Agent architecturein Ambient Assisted Living scenarios.

Keywords: Talking Agent, ambient assisted living, agent architecture

1. Introduction

The context of this work is the development ofreusable components for building artistic installationsthat are able to interact with the spectator using dif-ferent media, especially speech but also including dif-ferent types of sensors and actuators in an intelligentenvironment. These components should also be able tointegrate different Artificial Intelligence techniques forperforming a series of interactions with the spectatorthroughout a sequence of situations, each one a productof a specific context in which the spectator participates.In order to facilitate such integration in a distributedsetting, an agent-oriented approach has been followed.The artistic installation is conceived as a set of basicelements, each of them with a physical representation(a golden head sculpture, for instance), and a computa-tional entity, which is called a Talking Agent. A Talking

∗Corresponding author: Dr. Juan Pavon, Universidad Complutensede Madrid, Facultad de Informatica, 28040 Madrid, Spain. E-mail:[email protected].

Agent is a reusable software component, which is char-acterized as having autonomous and social behavior. Ithas the ability to interact with humans using speech orother communication channels, and to work in a dis-tributed computing environment where it can commu-nicate with other agents and manage resources. In mostscenarios, Talking Agents use heterogeneous resources,which vary depending on the needs of each particularartistic installation. For example, in the ORACULOSinstallation, which is described later, speech recogni-tion, processing and synthesis resources are needed, aswell as a variety of sensors in the environment, in orderto obtain knowledge of what the spectator is saying anddoing.

Talking Agents in an installation are organized as asociety of agents that can cooperate and interact amongthemselves. Depending on the installation, the specta-tor will interact with each Talking Agent one by one, orwith several at the same time. Communication amongTalking Agents can help them to obtain more informa-tion about the context (e.g., what the spectator is do-ing and has said) and help each other to better managethe interaction with the spectator. Also, each Talking

ISSN 1069-2509/10/$27.50 2010 – IOS Press and the author(s). All rights reserved

244 J.M. Fernandez and J. Pavon / Talking Agents: A distributed architecture for interactive artistic installations

Agent may implement a different strategy for conver-sation with the spectator, and the result of the spectatorexperience will emerge from the combination of theresponses given by the Talking Agents participating inthe installation. It is important also, from a conceptualviewpoint, to explore social issues (Talking Agents asa society with emergent behavior) and the role of thehuman spectator in such virtual societies.

The main focus of this research is not the develop-ment of the actual resources used in each scenario, suchas speech recognition and synthesis techniques, whichhave been implemented using existing software com-ponents. Instead, research has been done in the designand implementation of the agent architecture that facil-itates the evolution of Talking Agent capabilities, sincethey should be easily configurable to use different kindsof resources as needed for each specific scenario and tointegrate various Artificial Intelligence techniques. Inthis sense, the Talking Agents are reusable componentsfor building different artistic scenarios.

Several agent frameworks were considered for im-plementing Talking Agents, and finally ICARO [30]was adopted. It is an open source project that promotesthe development of distributed applications as organi-zations of agents and resources. In comparison to morepopular agent frameworks (e.g. Jade, Jack), ICARO hasthe advantage of promoting an organization-basedviewof the multi-agent system, being flexible in the commu-nication mechanisms and protocols used among agents(not forcing, for instance, the use of FIPA standards).In addition, ICARO provides support for managing amulti-agent system configuration in a similar way tocomponent-based frameworks, which is useful in termsof its ability to cope with our requirement on the flexi-bility of resource configuration of Talking Agents. An-other interesting facility of ICARO is the provision oftwo agent patterns, one for reactive agents, based onfinite state machines, and another for cognitive agents,to facilitate the specification of agents with reasoningand planning capabilities. The first pattern was usedas a basis to define the Talking Agent, which is itselfa kind of pattern, as it can be configured for differentsystems.

The rest of the paper is structured as follows. Sec-tion 2 discusses the similarities of Talking Agentsand dialogue systems, by taking into consideration thetrends in this field and explaining the activities thatare involved in speech interaction. This establishes theground for the architecture of a Talking Agent that isdescribed in Section 3. Section 4 presents the multi-agent architecture setting with the ICARO platform,

and identifies other types of agents that help TalkingAgents in their configuration and coordination. Thisarchitecture has been used in an experiment with theORACULOS installation, which is described in Sec-tion 5. The last section presents some conclusions de-rived from the experimentation and considers the evo-lution and application of the Talking Agent architecturefor Ambient Assisted Living scenarios.

2. Dialogue systems and Talking Agents

The main concern of the research with TalkingAgents is the development of a flexible multi-agent sys-tem architecture that facilitates different kinds of in-teraction with humans, taking advantage of previousagents’ experience and information from the environ-ment (e.g., from sensors). The research is thus simi-lar in several aspects to the multimodal dialogue sys-tems field. This section briefly reviews developmentsin this area, with a focus on representative dialogue sys-tem architectures, and finishes with a summary of themain concepts and associated technologies. This willserve to explain the motivation and main challenges ad-dressed by the Talking Agent architecture and its pos-sible implementations with respect to other solutions.

2.1. Evolution of dialogue system architectures

Earlier dialogue systems consisted of simple infor-mation pipelining through different abstraction levels,without explicit treatment of the dialogue context. Themain phases of this pipeline were: input perception,input analysis to determine speech action, managementand decision about interaction, response generation,and response execution. This is basically the approachof Eckert et al. [19], which is often used as an exam-ple because of its simplicity. This has been extendedin [3], with new elements such as discourse contextmanager, reference manager, and content planner. Andimproved later in [4], which makes the organizationclearer by distinguishing three sections: interpretation,generation and behavior.

A further step in structuring dialogue systems is theexplicit distinction between interaction-level commu-nication and content-level communication [28]. Theformer refers to the communication itself and it is usedto manage certain parameters such as talking speed,turn, and form of the communication. The content-levelcommunication is the main communication and refersto the objective of the communication.

J.M. Fernandez and J. Pavon / Talking Agents: A distributed architecture for interactive artistic installations 245

Other approaches take the pipeline architecture asa basis and introduce improvements in the differ-ent phases for some specific processing. For instance,Casell [15] proposes managing the interaction-levelcommunication using an improved interpretation mod-ule that is able to discriminate between interaction andcontent events.

Other architectures take into account more aspectsthan dialogue management, such as distribution of com-ponents so that they can be used as services. This ideais commented in [23] and it is considered in TalkingAgents as well.

The Olympus Architecture [10] also follows thepipeline pattern, with components for processing theinformation in each phase, which are independent ofeach other and are developed separately. This sepa-ration of phases and the independence of the modulesare similarities with the Talking Agents architecture.However, Olympus focuses on verbal and text-basedcommunication, leaving other types of interaction to in-ternal applications. With Talking Agents, the interpre-tation of the inputs may vary depending on many typesof stimuli, including speech, so they should be consid-ered all together in the pipeline. Another differencewith Talking Agents is that Olympus does not considercollaboration among distinct agents in the system.

The TRINDI architecture [27] follows the Informa-tion State Update paradigm, which considers the defi-nition of rule modules (application components) of twodifferent types: the Dialogue Move Engine modules,and the others. All the rule modules can read informa-tion from the global information state and write theirresults there. The Dialogue Move Engine is respon-sible for triggering the corresponding update rules ofeach other module when the necessary information isavailable, in order to coordinate the processing of themodules and decide on the actions. This approach per-mits the addition of new components dynamically, andit is not as focused in verbal and textual communicationas Olympus, since it permits the addition of new ar-bitrary modules that interact with the global state, andthe corresponding Dialogue Move Engine modules tocoordinate them, without altering the rest of the system.However, one aspect that is not considered in this ar-chitecture is collaboration among different agents in agreater system, which the Talking Agents system doesconsider.

2.2. Associated concepts

When reviewing the different dialogue system ar-chitectures, there are common activities for informa-

tion processing. Each of these can exploit distinct tech-nologies, depending on the actual requirements of thesystem. These activities are the following:

2.2.1. Input gatheringThis activity processes the signals coming from the

environment and generates messages or events to thesystem. These are examples of technologies used:

Speech recognition: MS Speech API [18], JavaSpeech API [36], Sphinx-4 [14], TalkingJava [16].Visual speech recognition: several experimental re-sults are described in [40] and [29].Graphical user interfaces: Java Swing, AWT,acm.gui [1].Physical-magnitude sensors: Phidgets APIs [33].

2.2.2. Input processingThis activity processes the information that results

from the previous activity and determines a domain-specific meaning for the input or sequence of inputs,possibly making use of the context of the interaction,created through previous inputs. These are examples oftechnologies used:

Natural language analyzers: FreeLing [5], Stan-ford Parser [26] (only analyzes the syntax).Phoenix [39] and SPIN [21] are able to work withsemantics within a certain domain.Confidence annotation of recognition results: He-lios [8].Case-based reasoning for previous identified inputcases: JColibri2 [2].Rule-based reasoning for mapping inputs into acts:Drools [6].

2.2.3. ControlThis activity is present during the whole process. It

manages the information flow during the interaction.Its definition is one of the main differences among dif-ferent architectures.

Dialog system middleware: DIPPER [11].

2.2.4. Dialogue managementA dialogue is a prolonged interaction in which ref-

erences to previous information and context are made.In this activity the system decides the answers depend-ing on the dialogue being maintained. To make thisdecision, information from the previous interactionsis accessed and updated according to the new situa-tions. This activity also contributes to the differencesamong different architectures. These are examples ofparadigms used:

Information state update (ISU): TRINDI [27].Expectation agenda: RavenClaw [9].


2.2.5. Output generationThis activity takes the text generated by the dialogue

manager and decides how to coordinate the availableresources in order to produce the response. These areexamples of technologies used:

Case-based reasoning: JColibri2 [2]. In [20] a wayof using this type of technology is described fordialogue systems and in [17] for deliberative agents.Rule-based reasoning: Drools [6] is a rule execu-tion engine. TRINDI [27] is an example of the useof this technology.Task decomposition: RavenClaw [9].Context-aware processing: Used for adaptive ser-vices [22,31].

2.2.6. Output handlingThis activity makes the specific requests to the re-

sources in order to produce the outputs. These areexamples of technologies used:

Speech synthesizers: Microsoft Speech API [18],Java Speech API [36], FreeTTS [35], Text-Sound [12].Graphical interfaces: Java Swing, AWT, acm.gui [1]).

Taking into account the diversity of technologies adialogue system can include, it is important that thesoftware architecture be flexible enough for its compo-nents to be changed. Furthermore, some of these com-ponents are demanding of computing resources, so it isinteresting to consider the ability to distribute the partsof the dialogue system to adapt to the capabilities ofdevices.

3. The Talking Agents architecture

Talking Agents are not isolated entities, as they arepart of a multi-agent system. This is one of the maincharacteristics with respect to typical dialogue systems,as Talking Agents cooperate together and with otheragents to achieve their goals. The organization of themulti-agent system determines the global system ar-chitecture, which is described in Section 4. Also, eachindividual agent has its own architecture, with well-defined component interfaces. This section describesthe architecture of the Talking Agent, first by show-ing the static view of the system, i.e., the relationshipsamong its components, and next the dynamic view, toshow the interactions among components in order toachieve the Talking Agent objectives.

3.1. The Talking Agents structure

Figure 1 is a simple diagram that shows the compo-nents that make up a single Talking Agent as a con-versational agent: perception, interpretation, planningand execution components, and the Talking Agent core.The diagram shows the components used in the ORAC-ULOS scenario (including speech recognition, sensorsand speech synthesis). Each component implements awell-defined generic interface, depending on the previ-ously identified activity it performs, thus enabling theTalking Agent to manage an interaction of any kind(multimodal), in a generic manner, by customizing therespective components. The Talking Agent core is re-sponsible for deciding (using a rule engine) the ap-propriate intention given certain knowledge, and forcoordinating the available resources, using a reactivecontrol model (from the ICARO platform, as describedlater).

3.1.1. Perception (input gathering)A perception component is a thread that generates

events with some information, either when it detectscertain situations for which it is prepared or in a periodicmanner, depending on the type of stimuli and the wayit expects to obtain the information.

The first type is called eventual perception, where asingle state change provides meaningful information.The on demand perception (i.e., the information that isaccessed when the agent makes a request for it, in theexecution phase) and the communications from exter-nal agents are considered to be of this type.

The second type is called periodical perception,where the perception does not offer enough informationin just one state change, so it is necessary to observeseveral state changes during a certain period in order tohave enough information to process. The list of statechanges notified over time is considered a block forinterpretation purposes.

An example of an eventual perception element usedin the ORACLES installation is the Speech Recogni-tion, which is implemented in Talking Agents usingTalkingJava [16], which notifies an event when it rec-ognizes a speech, and it attaches the recognized textto it. An example of a periodical perception elementis a Sonar Sensor, with the Phidgets API [33], whichperiodically reports the value detected, generating se-quences of values, to enable the system to determine ifsomeone is approaching or moving away.

Any perception component must implement an in-terface that has the following methods (only the mainoperations are shown):


Fig. 1. Talking Agent overview.

AddSubscription(Subscription): adds a subscrip-tion object to the component. This object containsthe information necessary for the component to no-tify a subscriber component when a change has oc-curred in a certain subject in the context of the per-ception component.RemoveSubscription(SubscriptionId): removesa subscription object on the component.NotifySubscribers(Subject,Info): notifies thesubscribers when a change has been perceived ina certain subject, with associated information, bysending the AcceptPerceptionEvent signal to thesubscribers, as will be described later.

The Subscription object contains the following in-formation:

SubscriptionId: a code identifying the subscrip-tion.SubscriberId: a code identifying the subscribercomponent, in order to be able to send it the eventobject.Subject: a code representing the subject for whichthe subscription is made.NotificationPeriod: the amount of time betweennotifications, in the case of periodical perception.WindowSize: the number of grouped events sentat the same time, in case of periodical perception(used to interpret sequences of events).

3.1.2. Interpretation (input processing)The interpretation elements take the events gener-

ated by the perception elements and try to extract themeaning of the information. The interpretation processis divided into different dimensions: specialization andmemory. In terms of the specialization, interpretationscan be:

Local: interprets events coming from a specificsource, ignoring the rest.Global: interprets the local interpretations fromvarious sources (i.e. constructs a higher level inter-pretation based on lower ones).

And, in terms of the memory, interpretations canconsider:

Current events: interprets only the current eventsin the system, ignoring previous ones. This inter-pretation is made for eventual perception events.Sequence of events: interprets a sequence of eventswithin a time window. This interpretation is madefor periodical perception events.Previous interpretations: when taking into con-sideration interpretations of previous events or, ingeneral, the current mental state of the agent (i.e.the information representing the knowledge avail-able to the agent and the objectives it is currentlypursuing).


The result of the interpretation is a certain amountof knowledge the agent can use in its decision process.The knowledge is represented in a symbolic manner sothat it can be processed by a reasoning engine such asa rule-based reasoning engine.

The interpretation of a heterogeneous set of situa-tions fits well with the idea of case-based reasoning,because this kind of technology can identify previ-ous cases and associated interpretations from complexamounts of information, and adapt those interpretationsto the current case. For this reason, a generic patternfor global interpretation using previous interpretationshas been developed using this paradigm. This patternis implemented using the JColibri2 tool [2] and it isused in the ORACULOS implementation.

The local interpretations for current events or se-quences of events must be implemented using specificprocessing for each kind of perception element. For in-stance, events in the Speech Recognition resource areinterpreted using natural language understanding tech-nologies like FreeLing [5], Phoenix [39], or SPIN [21].

Any interpretation component must implement aninterface that has the following methods. However, it ispossible to customize the existing JColibri2-based [2]pattern with the appropriate cases in order to adapt itsbehavior to the desired global interpretation.

UpdateMentalState(MentalState,AgentId): up-dates the mental state associated with an agent inthe interpreter component, in order to include thelast perceived events and other available knowl-edge, and triggers a new interpretation based on thenew information.SendInterpretation(AgentId): sends the Update-MentalState signal to the agent, if the interpretationprocess has changed the MentalState object, or theNoMentalStateChange signal otherwise.

The MentalState object contains the following infor-mation:

EventsBuffer: a buffer containing the non-proce-ssed perception events.Beliefs: the collection of agent’s beliefs, includingcertain facts (e.g., configuration parameters). Theseobjects are used to trigger the decision rules.Intentions: the set of an agent’s intentions that arenot yet carried out.ActionsBuffer: a buffer containing the pending ac-tions.

3.1.3. Decision: Talking Agent coreThis is the part in which the agent coordinates its

managed modules and its functional cycle and decideswhich immediate objectives to pursue, depending onthe knowledge of its own state and the environment(the agent’s beliefs). The decision is performed at twodifferent levels: control and intention.

At the control level, the agent only decides whichphase of the information flow is going to be processednext. This decision process is modeled as a statemachine following the reactive agent behavior model(from the ICARO framework), which controls each ofthe functions performed by the agent at every moment.This kind of state machine reacts to certain receivedsignals, which are generated by the resources, indicat-ing the completion of each processing. Although it isrecommended that the state machine provided be usedin order to implement the Talking Agent behavior, onemay modify it to address certain concrete requirements.

The main signals accepted by this state machine arethe following:

AcceptPerceptionEvent(PerceptionEvent):in the perception phase, it includes the new percep-tion event in the events buffer of its MentalStateobject and calls the UpdateMentalState method ofits interpretation component. Then changes to theinterpretation phase.AcceptInterpretation(NewMentalStatE): in theinterpretation phase, updates the mental state of theagent and runs the rule engine, as we will see later.Then changes to the decision phase.DecisionFinalized: in the decision phase, this sig-nal indicates that the rule engine has ended itsprocessing. The UpdateMentalState and Gener-atePlan methods of the planning components arecalled. Then changes to the planning phase.AcceptPlan(Action): in the planning phase, in-cludes the new action in the actions buffer ofits MentalState object and then calls the Executemethod of the action, as we will see later. Thenchanges to the execution phase.ActionFinalized: in the execution phase, this sig-nal indicates that the action has been executed. TheObtainResults method of the action is called, up-dating the mental state. Then changes to the per-ception phase.NoMentalStateChange: in any phase, changes tothe perception phase.

At the intention level, the agent decides its next im-mediate objectives, or intentions. The knowledge used


for the process obviates irrelevant low-level details,such as concrete words used by the user, the frequencyof his voice, etc. Similarly, the generated decision alsoobviates those details, only expressing the agent inten-tions from an abstract point of view. The concrete de-tails depend on the lower levels. This is implementedusing the Drools [6] rule engine, which produces a setof intentions based on a set of input beliefs, correspond-ing to the specified rules that model the agent behavior.An example of the definition of these rules is shown inthe Appendix. With this formalization, the Belief ob-jects (which include Beliefs and Facts), obtained frominterpretations, and distinguished by the code field, areput into the when part, thus triggering the then part atthe moment they are identified, executing the creationof certain Intention objects, which will determine thenext plans.

After each execution of the rule engine, a Decision-Finalized signal is sent to the state machine of the agentif there has been any change in the mental state, or aNoMentalStateChange signal otherwise.

3.1.4. Planning (output generation)The planning elements take the intentions generated

in the previous phase and create a set of concrete actionsto be carried out.

Basically, the intentions are broken down into ac-tions through several iterations, starting from the moreabstract, generic actions to the more specific, concreteactions, which will trigger operations on the associatedexecution resources.

A general planner makes use of more specific plan-ners that plan the sequence of actions for a particularactuator. This tree-like process fits well with the ideaof rule-based reasoning, so a generic planner has beenimplemented for the Talking Agents architecture usingthe Drools [6] rule-based reasoning engine. An ex-ample of definition of these rules is shown in the Ap-pendix. With this formalization, the Intention objectsare put into the when part, thus triggering the creationof Action objects, which will execute the plan.

Any planner component must implement an interfacewith the following methods:

UpdateMentalState(MentalState,AgentId):updates the mental state associated with an agentin the planner component, in order to include thelast generated intentions and other available knowl-edge.GeneratePlan(AgentId): generates a list of ac-tions associated with the specified agent’s mentalstate and sends the results to the agent.

The actions generated are objects of any of the fol-lowing types:

Atomic actions: actions that call operations onexecution resources directly. Sending a signal to anexternal agent is considered an atomic action.Specialized actions: actions that create a HelperAgent as a result. A Helper Agent is a special kindof resource, which has its own initiative and cancommunicate the progress of its task to the parent.Script of actions: any arbitrary script of some ofthe other types of actions.

An action object has the following methods:

Execute: either triggers the execution of an oper-ation on an associated execution resource, createsa Helper Agent, or executes a sequence of actions,depending on what type of action the instance is.ObtainResults: retrieves the initial feedback of theaction.

3.1.5. Execution (output handling)This part is responsible for performing the previous-

ly generated actions. The immediate results of the ex-ecution of the actions generate an initial feedback thatcan be used as knowledge in the decision process.

An example of an execution element used isthe Speech Synthesis, which is implemented usingTextSound [12] as explained in Section 4.2.3.

An execution component may implement any arbi-trary operations, which will be called by a correspond-ing Action object.

3.1.5.1. Helper AgentsSometimes, certain actions require continuous recep-

tion of feedback from the environment and successivetuning or some special treatment that would consumethe attention of the agent. For example: complex com-munication protocol handling with another agent, or aspecific sub-interaction with a user.

For these cases, Helper Agents can be used. HelperAgents are like normal agents with two differences:

– Their objectives are very concrete: they come in-to life with a very specific mission that must beaccomplished.

– Their life is limited: when their objective isachieved, they self-destruct, informing their cre-ators.


Fig. 2. System information flow.

An example of a Helper Agent that is being devel-oped is one whose objective is to guide a spectator to-ward a certain point of the room in order to begin theinteraction. This agent would use the sensors of theroom to guess the user position and the speech synthe-sis to give instructions until the user reaches the correctposition.

3.2. Dynamic view

As can be deduced from the previous section, theinformation flow established between system and userruns through different abstraction levels, as shown inFig. 2. The information exchange generated is con-trolled by the Decision component at the control level,using signal sending. The abstraction levels are namedas:

Physical: is the physical form of the environmentinformation: sound waves or electromagnetic sig-nals.Machine: is the information handled by certainsoftware relative to input and output events.Knowledge: is the machine information con-densed for representing certain useful concepts fordecision-making.These abstraction levels are crossed in both direc-tions, depending on the direction of the informationflow. When an agent makes the decision to com-municate something, it generates the knowledge inthe form of an intention. That intention only con-

tains information on what to do, but not on how todo it. The planning module transforms the inten-tion into a set of actions, which specifies the way toachieve the immediate objective. Finally, the exe-cution module executes the sequence of actions inthe world.In the other direction, when an agent receives phys-ical information, e.g., sound waves, a module takesthat information and transforms it into machineinformation, which can be handled by the agent.Later, the interpretation module obtains the rele-vant knowledge of the machine information. Thatknowledge can now be used in the decision makingprocess of the agent.Figure 3 shows the complete processing sequenceperformed by the Talking Agent, which coordi-nates the information flow through the different re-sources.

4. Multi-agent system architecture

Talking Agents collaborate among themselves andwith other agents in the ICARO agent framework [30],which provides communication and management ser-vices that facilitate the distribution, configuration andmonitoring of the multi-agent system. The use of thisframework implies some constraints in the way TalkingAgents are built and deployed, with the advantage of re-lieving the developer of implementing certain distribu-


Fig. 3. Processing sequence of a Talking Agent.

tion and management concerns. In order to understandthe organization of the Talking Agents system, first wepresent the ICARO framework and then how TalkingAgents are part of an ICARO agent organization.

4.1. The ICARO framework

ICARO is a framework for building distributed ap-plications, which are conceived as organizations of two

types of entities: agents and resources. ICARO pro-vides services for such an organization, in order to fa-cilitate its distribution, configuration and monitoring.A system implemented with ICARO consists of thefollowing types of entities:

Agent: represents an entity that can manage infor-mation flows. It can send commands to resources,receive their information and distribute it to otheragents and resources. Each agent has a behaviormodel, which defines the way it will act in each cir-cumstance. Each behavior model provides a patternfor specification and control of an agent’s behavior.Currently there is a reactive model, which is basedon a finite-state automaton, and a cognitive mod-el, based on the definition of a set of rules and aknowledge base.Resource: represents an entity that performs op-erations on demand by some other entity, an agentor other resource. These operations usually satisfythe functional requirements of the application.Information: represents an entity that is part of theapplication’s domain model. Information entitiesare used either to structure information, to storeinformation, or to perform certain business tasks.They are managed by both agents and resources andthey basically make up the information flow of thesystem.Description: represents an entity that describeshow the agents and resources are organized in thesystem, i.e., their dependencies and deployment.

In addition, ICARO provides management facilities,which establish that every agent or resource within thesystem is a manageable element. A manageable ele-ment offers an interface to perform management op-erations on it, such as start, shutdown, pause or test-ing. They are used to maintain the system’s integrityand have nothing to do with application’s functionalrequirements. Because of this, the management opera-tions are usually handled by a set of pre-defined agentscalled managers. These agents have the responsibilityof putting the system to work in the first place, i.e. al-locating necessary resources, instantiating agents andresources; checking and maintaining the integrity ofthe system; and possibly stopping the system.

4.2. Talking Agents in an ICARO agent organization

The Talking Agents system, as a society composedof several autonomous entities, has been designed as amulti-agent system in which each agent has the capa-


bility of interacting with humans, as a conversationalagent, but also with the other agents, as in a typicalmulti-agent system. This inter-agent communication isnecessary for coordination purposes since agents maycollaborate to achieve certain objectives. In the usecase considered, the objective is to create the illusionfor the spectator that they are within a society.

The Talking Agents architecture follows the ICAROorganization paradigm, which already manages the dis-tribution of components and their dependencies. Withthis paradigm, each component is considered an agentif it has a certain level of autonomy or a resource if itonly reacts to operation requests. Following this ap-proach, Perception, Interpretation, Planning and Exe-cution modules are considered as resources, and the co-ordination and orchestration of these resources are donein the Talking Agents, which use Decision modules. Inaddition, these components coexist with others that arepredefined within the framework, the above-mentionedmanagers, which are responsible for management tasksin the system. The system view for the ORACULOSscenario is shown in Fig. 4. The dependencies amongthe components are specified using organization de-scription artifacts from the ICARO framework.

The architecture can be divided into three types ofentities, which are explained below.

4.2.1. Manager agentsOrganization manager: this agent is responsi-ble for configuring, initializing and monitoring thewhole system. It is able to detect problems in theinitialization process due to poor functioning ofcomponents or configuration. It delegates to AgentManager to manage the agents and to Resourcemanager to manage the resources.Agent manager: This agent is responsible for con-figuring, initializing and monitoring the applicationagents. It is able to detect problems in these pro-cesses and to notify the Organization Manager ofthem.Resource manager: This agent is responsible forconfiguring, initializing and monitoring the appli-cation resources. It is able to detect problems inthese processes and to notify the OrganizationMan-ager of them.

4.2.2. Application agentsTalking Agent: This is the main agent of the sys-tem. It is implemented using the reactive agentpattern defined in ICARO, so its control model isdefined as a state machine, sensitive to certain sig-

nals, which coordinates the information processingthrough the different resources and communicationwith other agents. There are three instances of thistype of agent running on the ORACULOS system.Its implementation is based on the Decision com-ponent.

4.2.3. Application resourcesInterface Phidget Kit Resource: this resource isable to process the continuous signals received fromdifferent sensors about the environment in order toperiodically generate the corresponding sequencesof events containing the sensor values, which willbe sent to the subscribing agents. It is implementedusing the Phidgets API [33].Recognition Utility Resource: this resource is ableto process the sound waves detected by a sound in-put device that correspond to certain discourse inorder to recognize the pronounced words. It thengenerates the corresponding events containing thatinformation, which will be sent to the subscrib-ing agents. It is implemented using the MicrosoftSpeech [18] implementation of JSAPI [36].Oracles Interpreter Resource: this resource con-sists of distinct components: those that can interpretthe distinct perception events locally (sensors anddiscourse), obtaining partial knowledge about theinteraction; and those that can consider that partialknowledge globally and, considering the previousinteractions and the current state of the agent, canmake the correct interpretation. This resource usesthe case-based reasoning pattern implemented withJColibri2 [2].Oracles Planner Resource: this resource is ableto determine how the intentions of the agent will beperformed. In this case, it constructs the text thatwill be said by the agent in order to express the an-swer it wants to communicate. This resource usesa case-based reasoning system that contains a baseof discourses that are adjusted according to the spe-cific situation of the agent. It is implemented usingJColibri2 [2]. This component is specific for theconcrete application, in this case for the discoursesthat the particular installation requires.TextSound and SoX Resource: this resource isable to transform the text provided as input intospoken words, which are played through a partic-ular sound output device. It is implemented usingTextSound [12], which is one of the few text-to-speech engines that work properly in the Spanishlanguage.


Fig. 4. ICARO view of the system.

4.2.4. Inter-agent communication modelThe inter-agent communication capability is the

main feature that distinguishes the multi-agent archi-tecture view from the single-agent architecture view.This feature has been integrated into the latter in thefollowing way: any agent is able to send an Accept-PerceptionEvent signal to another agent, as if it werea perception resource. This way, the receiver agent isable to process the communication in the same wayit processes any other perception event from the envi-ronment. The concrete protocols used to establish col-laborations are implemented when defining a concretesystem using this principle.

5. The ORACULOS installation

ORACULOS is an interactive artistic installation thatcreates the conditions of an oracle receiving the spec-tator to give him or her advice on the future. Theterm oracle designates either the asked divinity, the hu-

man mediator who transmits the response, the sacredplace or the given response. Each spectator will haveto individually face the oracular trial. Three art ele-ments with a golden human head, internally made upof three Talking Agent instances representing the ora-cles, receive the spectator (only one is allowed to en-ter at a time). Each element recognizes the spectator’sspeech (TalkingJava [16]), generates spoken responses(TextSound [12]), and can detect his or her presencethrough distance sensors as well (Phidgets API [33]).The spectator moves from one oracle to another (indifferent rooms), being asked by each one to be morespecific or to redefine the inquiry to receive a final re-sponse (see Fig. 5). Apart from direct interaction withthe spectator, the Talking Agents collaborate amongthemselves to achieve greater accuracy in guessing theintentions of the spectator, by sharing the informationachieved with each question. Thus, each time the spec-tator is asked, the oracle generates new knowledge tobe used by the others, generating multiple feedbackstimulation.


Fig. 5. The spectator passes through different rooms, and interacts with the oracles using the resources (distance sensors, speech recognizers andspeech synthesizers).

The main objectives that led the development of thisscenario, from a technical point of view, and the ap-proaches to achieve them, were the following:

– Refine the architecture, by following an iterativeand incremental process for the development ofthe Talking Agent architecture.

– Prove the flexibility of the architecture for inte-grating different resource types and control tech-niques.

– Analyze the emergent behavior of the TalkingAgents, by defining a simple communication pro-tocol among the agents so they can share their in-terpretations of the dialogue with a user (agentssend a message with that information to otheragents), in order to observe how an agent developsa behavior due to its coexistence in a multi-agentsystem that otherwise would be different.

– Fulfill the requirements of the scenario, by imple-menting each part of the system, especially theinterpretation, decision and planning rules.

The achievement of the last three objectives wouldprove the success of the Talking Agent architecture.The discussion of how these objectives have beenachieved is in Section 5.4.

It is important to note that the main research of thiswork has not been aimed at natural language process-ing. Consequently, the system, apart from the speechrecognition and synthesis tools, performs simplisticprocessing of the user’s discourses and system respons-es, using similar principles from chat bots, like Cle-verbot [24], or certain videogames, which just detectcertain keywords in the discourse to infer some ba-sic knowledge and generate response (in this case, the

main topics of the questions). Instead, the objective ofthe scenario is to prove the success of the architectureby testing its capability to coordinate several differentresources (speech recognizer, sensors, and speech syn-thesizers) as well as the communication and collabora-tion among the agents.

5.1. Main scenario

This scenario represents the normal functioning ofthe system. The intention is to make the spectator travelthrough the installation, and interact with three oraclesone by one, in different rooms, as shown in Fig. 5. Thisscenario shows the conversations of a spectator withthe Talking Agents (Oracles 1 to 3):

Spectator: (enters the room and activates the sen-sors, generating an event that is interpreted as a newspectator arrival, provided that the agent is waitingfor a spectator)Oracle 1: Speak, human! Would you like to knowabout health, money or love?The knowledge of the entrance of a new spectatortriggers a rule in the agent that creates the inten-tion to say a welcome message and ask an initialquestion. This text is generated by the planningresource and later spoken by the speech synthesisresourceSpectator: Umm. . . let me think. . . about healthThe spectator’s speech is recognized by the speechrecognition resource, which generates a speechevent. This event is later interpreted by the interpre-tation resource, which extracts the word “health”,generating the knowledge of the topic of conversa-tion.


Oracle 1: Everyone was healthy before becomingill. Will you go ahead into to the next room to findout more about your health?Using the knowledge of the spectator’s previousanswer, the agent again triggers a rule to ask aquestion related to the topic, the text of which isagain generated by a planning resource using apredefined text database, and spoken by a speechsynthesis resource.Spectator: Yeah, of courseOracle 1: Continue, visitorSpectator: (enters the next room and activates theentrance sensors for that room)Oracle 2: Tell me, human, what is the problem thatthreatens your health?The knowledge of the new spectator arrival trig-gers a rule in the agent that creates the intention ofasking the other agents for knowledge, thus obtain-ing the information from the previous agent that thesubject of the question was Health.Spectator: It’s. . . my back, it’s killing me!Oracle 2: Your body is an imperfect machine. Youare not yet prepared to hear our advice. Continueto the next room.Spectator: (reaches the last room and activates thesensors)Oracle 3: (after receiving information from theprevious Oracle that the problem is “BackProblem”as in the last user discourse) You wasted the oppor-tunity of gaining insight just asking about backs,did you? What is the motivation for that question?Spectator: Umm. . .The interpretation of the last discourse does notgive any valuable knowledge, so the “UnknownTopic” knowledge is generated, which will triggera response that will only consider information fromprevious discourses.Oracle 3: You don’t even know the reason for yourvisit. Here is my oracle: you have to feel not onlyyour own pain but also the suffering of everythingaround you. Now, leave.The last answer is the “final oracle”. It is a sen-tence from the related topic chosen from a database.Actually, the correct topic is “BackProblem” andthe sentence should have been about that, but sincethere are no entries in the text generation databasefor that topic and intention, the planning resourcegenerates a text related with the super-concept in-stead. In this case it is “Health” again.Spectator: (leaves the room).

5.2. Secondary scenario

In this scenario, the spectator does not pass the firstoracle, as the Talking Agent is not able to recognizewhat the spectator is talking about. This can occur fordifferent reasons: the spectator does not speak correct-ly, he uses very strange terms, or he simply does notcollaborate with the oracles. The Talking Agent there-fore generates automatic sentences to ask the spectatorto reformulate his question a limited number of times.In the following example, the number of attempts thespectator can make is limited to two. The intentionis to prevent a single spectator from monopolizing theinstallation and to allow other spectators to experienceit.

Spectator: (enters the room and activates the sen-sors, notifying the agent of his presence)Oracle 1: Speak, human! What is the main topicthat worries you?Spectator: Umm. . . let me think. . . hmm. . . Idon’t knowIn this case, the agent doesn’t give a set of op-tions. The spectator could use any word relatedwith health, money or love in order to determinethe topic, but he had initial doubts. With his an-swer, the interpretation resource can’t extract use-ful knowledge, so the agent decides to ask again.Oracle 1: Your ideas are confusing. Try to refor-mulate your answer. Are you worried about health,money or love?Spectator: It’s complicated, what am I supposedto say?Once again, the answer is not useful. The agentdecides that it won’t give more opportunities so itfinalizes the conversation.Oracle 1: The art of questioning is not as easy asit seems. It is necessary to know many things to beable to question what is not known. . . Leave andcome back when you are ready.Spectator: (leaves the room).

5.3. Alternative scenario

This scenario represents another possible deploy-ment of the Talking Agents and their resources, whichis interesting for observing the emergent behavior ofthe system. In this case, there are three instances ofTalking Agents, but only one instance of each resource:distance sensor, speech recognizer and speech synthe-sis. Thus, there is an agent that can only “see”, an-other that can only “listen”, and another that can only“speak”. Consequently, the agents must collaborate inorder to create a real interaction with the spectator.


Spectator: (enters the room and activates the sen-sor, generating an event that is sent to the “seeingagent”).Seeing agent: (receives the sensor event, and in-terprets a new spectator arrival. Then it sends thisinterpretation to the other agents).Speaking agent: Speak, human! Are you worriedabout health, money or love? (communicates endof speech to the other agents).Since the Speaking agent has received the interpre-tation of the arrival of the spectator from the Seeingagent, it is able to act consequently and generatethe welcome discourse.Spectator: I am completely in love with myneighbor. . .Listening agent: (receives the speech of the spec-tator, and interprets the “Love” topic. Then com-municates it to the other agents).Speaking agent: Love is a beautiful flower that liesat the edge of a precipice. Now, leave (communi-cates end of speech to the other agents).The knowledge obtained by the Listening agent hasbeen used by the Speaking agent in order to createthe corresponding answer for the spectator.Spectator: (leaves the room).

5.4. Results discussion

The development, execution, and results analysis ofthe scenarios have contributed to the achievement of thepreviously identified objectives in the following ways:

Refine the architecture: The architecture defini-tion and especially its implementation have beenimproved in the iterative process of development.The specific details of this process are not of interestin this paper.Prove the flexibility of the architecture: Speechrecognition, distance sensors and speech synthe-sis resources are uniformly integrated in the sys-tem as perception and execution components, sincethey implement the required interfaces. Consid-ering this, it is possible to change these resourcesincluding, for example, a GUI to replace them allwithout changing any other part, and to obtain dif-ferent input and output types. A text input elementin the interface could act as the speech recognitionresource, sending the same events. A simple slidercould act as the distance sensor, and a textual outputcould act as the speech synthesis. As it is possibleto do this without altering the rest of the system,the flexibility is proven.

Analyze the emergent behavior: The coexis-tence with other agents gives an agent an additionalsource of information, thus creating the possibilityto take advantage of the other agent’s experience,which depends on its own environment. This way,an agent may take actions based on information thatit otherwise would not be able to obtain, which is aform of emergent behavior.In the main scenario, the absence of this informa-tion sharing would make the different rooms of theartistic installation completely independent, and itwould not give the sensation of undergoing somekind of “trial”, which is the intention of the sce-nario.In the alternative scenario, the individual agentsare completely unable to create an interaction withthe spectator, but working together, they obtain thisnew capability that none of them had before. Thewhole is greater than the sum of its parts.Fulfill the requirements of the scenario: The re-quirements of the scenario determine the followingcharacteristics:

– The event types the perception resources have tobe able to detect. In this case, speech perceptionevents and distance sensor events are detected.

– How a collection of events has to be interpretedby the agent in order to identify a certain interac-tion type. In this case, the case-based interpreta-tion pattern is customized in order to detect eachcombination of beliefs and events to determinethe correct interpretation.

– How the agent must act, depending on the in-terpretation of the environment. The Decisionmodule is implemented with the correct rules thatindicate the intentions based on the beliefs.

– How the intentions of the agents are broken downinto actions in order to obtain the desired effect.The Planning module is implemented with thecorrect rules to achieve this.

Considering the previous characteristics of the sys-tem, it is possible to fulfill the requirements of the de-sired scenario in the context of the architecture.

It is important, considering the previous point, toclarify that the main application of the Talking Agentsarchitecture is the development of intelligent environ-ments where there is a need to process arbitrary inter-action types. Although the proposed scenario does notconsider the existence of many concurrent interactions,special attention to synchronization issues may be nec-essary for other scenarios, as discussed at the end ofthe conclusions.


6. Conclusions and future work

Although the initial purpose of the Talking Agentsarchitecture is the development of interactive artisticinstallations, its generic definition also enables it fordeveloping other types of distributed systems that in-clude user-interaction through different channels. Thedifferent technologies of each phase of the interactioncan be integrated in a generic way so that each com-ponent may be replaced or extended in order to changethe system behavior. This was one of the main issuesto validate with the installation art scenario.

The ORACULOS scenario is defined with the inten-tion of making use of several simple resources and thecapability of sharing information among the agents, inthis case to share the conversation topic detected duringthe interactions of the spectators with the different talk-ing agents. This information can be used to improve theinterpretation made by resources such as speech recog-nition software. In this sense, the scenario shows theinterest of a collaborative architecture. The existenceof this inter-agent communication, apart from human-agent communication, expands the possibilities of themultimodal dialogue systems up to a fully qualifiedmulti-agent system, in which agents have special skillsto interact with humans outside of their own “virtualworld”.

Experimentation results have been the basis for con-sidering the evolution and application of the TalkingAgent architecture for Ambient Assisted Living (AAL)scenarios [37]. AAL face similar issues to those ad-dressed here for the interactive art installation: bothare electronic environments that are sensitive and re-sponsive to the presence of people, and this is achievedby integrating a variety of sensing, reasoning, acting,communication and interaction means. Talking Agentsprovide a solution for such integration. A distinguish-ing feature with respect to typical AAL implementa-tions that has been already explored in the architectureis the interaction with users through speech, as well ascontrolling different sensors and actuators. This takesadvantage of the ability of agents to communicate in-formation on the context that is derived from previousinteractions with the user.

Some issues require further work to gain flexibili-ty. For instance, the coordination of concurrent inter-actions (in the ORACULOS installation the user goessequentially from one Talking Agent to another) and asystematic management of real-time design issues [25].In this sense, we are currently working with a groupof Talking Agents, each one implementing a different

strategy, that cooperate at the same time to interpretthe user’s acts and speech and provide an agreed an-swer. This answer is selected from one agent in thegroup following some strategy (e.g., voting, or the firstready). This can support the satisfaction of timelinessand robustness requirements that are stronger in AALthan in an art installation scenario.

Acknowledgements

This work has been done in the context of the projectAgent-based Modeling and Simulation of Complex So-cial Systems (SiCoSSys), supported by Spanish Coun-cil for Science and Innovation, with grant TIN2008-06464-C03-01. In addition, we acknowledge supportfrom the Programa de Creacion y Consolidacion deGrupos de Investigacion UCM-Banco Santander withreference GR58/08.

Appendix

Example of definition of Planner

Example of definition of Intention Level


References

[1] ACM Java Task Force, acm.gui Package Web Site: http://jtf.acm.org/rationale/gui-package.html.

[2] B.D. Agudo, P.A. Gonzalez-Calero, J.A. Recio-Garcıa and A.A. Sanchez-Ruiz, Building CBR systems with jCOLIBRI, in:Special Issue on Experimental Software and Toolkits, JournalScience of Computer Programming 69 (2007), pp. 1–3.

[3] J. Allen, D. Byron, M. Dzikovska, G. Ferguson, L. Galescuand A. Stent, An Architecture for a Generic Dialogue Shell,in: Journal of Natural Language Engineering, special issueon Best Practices in Spoken Language Dialogue Systems En-gineering 6(3) (2000), pp. 1–16.

[4] J. Allen, G. Ferguson and A. Stent, An architecture for morerealistic conversational systems, in: Proceedings of IntelligentUser Interfaces (IUI-01), Santa Fe, NM, 2001, pp. 14–17.

[5] J. Atserias, B. Casas, E. Comelles, M. Gonzalez, L. Padro andM. Padro, FreeLing 1.3: Syntactic and semantic services inan open-source NLP library in: Proceedings of the fifth inter-national conference on Language Resources and Evaluation(LREC 2006), Genova, Italy, 2006.

[6] M. Bali, Drools JBoss Rules 5.0 Developer’s Guide, ed: Packt,2009.

[7] F. Bellifemine, A. Poggi and G. Rimassa, JADE – A FIPA-compliant agent framework, in: Proceedings of PAAM’99,London, 1999, pp. 97–108.

[8] D. Bohus and A. Rudnicky, Integrating multiple knowlegesources for utterance-level confidence annotation in the CMUCommunicator spoken dialog system (Tech. Rep. No. CMU-CS-02-190), School of Computer Science, Carnegie MellonUniversity, Pittsburgh, Pennsylvania 2002.

[9] D. Bohus and A. Rudnicky, RavenClaw: Dialog Manage-ment Using Hierarchical Task Decomposition and an Expecta-tion Agenda, in: Proceedings of the European Conference onSpeech, Communication and Technology, 2003, pp. 597–600.

[10] D. Bohus, A. Raux, T. Harris, M. Eskenazi and A. Rudnicky,Olympus: an open-source framework for conversational spo-ken language interface research, Bridging the Gap: Academicand Industrial Research in: Dialog Technology workshop atHLT/NAACL 2007.

[11] J. Bos, E. Klein, O. Lemon and T. Oka, DIPPER: Descriptionand Formalisation of an Information-State Update DialogueSystem Architecture, in: 4th SIGdial Workshop on Discourseand Dialogue, Sapporo.

[12] ByteCool, TextSound Web Site: http://www.bytecool.com/textsnd.htm.

[13] Carnegie Mellon University (CMU), School of Computer Sci-ence Web Page: http://www.cs.cmu.edu/.

[14] Carnegie Mellon University (CMU), Sphinx-4 Web Site:http://cmusphinx.sourceforge.net/sphinx4/.

[15] J. Cassell, Embodied conversational agents: representationand intelligence in user interfaces, in: AI Mag 22(4) (2001),pp. 67–83.

[16] CloudGarden, TalkingJava Web Site: http://www.cloudgarden.com/JSAPI/.

[17] J.M. Corchado, M. Glez-Bedia, Y. de Paz, J. Bajo and J.F. dePaz. Replanning mechanism for deliberative agents in dynam-ic changing environments, Computational Intelligence 24(2)(2008), 77–107.

[18] M.D. Dunn, Pro Microsoft Speech Server 2007: DevelopingSpeech Enabled Applications with .NET, ed., Apress, 2007.

[19] W. Eckert, T. Kuhn, H. Niemann, S. Rieck, A. Scheuer and E.G. Schukattalamazzini, A spoken dialogue system for German

intercity train timetable inquiries, in: Proc. European Conf.on Speech Technology, 1993, pp. 1871–1874.

[20] K. Eliasson, A case-based approach to dialogue systems, in:Journal of Experimantal & Theoretical Artificial Intelligence,2009.

[21] R. Engel, SPIN: A Semantic Parser for Spoken Dialog Sys-tems, in: Proceedings of 5th Slovenian and 1st internationalLanguage Technologies Conference 2006.

[22] D. Griol, N. Sanchez-Pi, J. Carbo and J.M. Molina, A Context-Aware Architecture to Provide Adaptive Services by means ofSpoken Dialogue Interaction, in: International Conference onArtificial Intelligence (ICAI’09), at WORLDCOM 2009, volII, pp. 912–918.

[23] J. Gustafson, Developing multimodal spoken dialogue sys-tems. Empirical studies of spoken human-computer interac-tion, PhD Dissertation, Department of Speech, Music andHearing, KTH, Stockholm, 2002.

[24] Icogno Ltd, Cleverbot Web Site: http://www.cleverbot.com/.[25] V. Julian and V. Botti, Developing real-time multi-agent sys-

tems, Integrated Computer-Aided Engineering 11(2) (2004),135–149.

[26] D. Klein and C.D. Manning, Fast Exact Inference with a Fac-tored Model for Natural Language Parsing, in: Advances inNeural Information Processing Systems 15 (NIPS 2002), MITPress, Cambridge, MA, 2002, pp. 3–10.

[27] S. Larsson and D. Traum, Information state and dialogue man-agement in the TRINDI Dialogue Move Engine Toolkit, in:Natural Language Engineering Special Issue on Best Practicein Spoken Language Dialogue Systems Engineering, Cam-bridge University Press, U.K., 2000, pp. 323–340.

[28] O. Lemon School and O. Lemon, Managing Dialogue Inter-action: A multi-layered approach, in: Proceedings of the 4thSIGdial Workshop on Discourse and Dialogue, 2003, pp. 168–177.

[29] W. Mahdi, S. Werda and A.B. Hamadou, A Hybrid Approachfor Automatic Lip Localization and Viseme Classification toEnhance Visual Speech Recognition, Integrated Computer-Aided Engineering 15(3) (2009), 253–266.

[30] Morfeo Community, ICARO Project Web Site: http://icaro.morfeo-project.org.

[31] I. Nieto, J. Botıa and A. Gomez-Skarmeta. Information andHybrid Architecture Model of the OCP Contextual Informa-tion Management System, Journal of Universal Computer Sci-ence 12(3) (2006), 357–366.

[32] A.H. Oh and A. Rudnicky, Stocastic language genera-tion for spoken dialogue systems, in: Proceedings of theANLP/NAACL workshop on conversational systems, 2000,pp. 27–32.

[33] Phidgets, Phidgets Web Site: http://www.phidgets.com/.[34] S. Seneff, E. Hurley, R. Lau, C. Pao, P. Schmid and V. Zue,

Galaxy-II: A Reference Architecture for Conversational Sys-tem Development, in: Proc. ICSLP, 1998, pp. 931–934.

[35] Speech Integration Group of Sun Microsystems Laborato-ries, FreeTTS Web Site: http://freetts.sourceforge.net/docs/index.php.

[36] Sun Microsystems, Java Speech White Paper: http://java.sun.com/products/java-media/speech/reference/whitepapers/index.html.

[37] The European Ambient Assisted Living Innovation Alliance(AALIANCE Project), Ambient-Assisted Living Roadmap:http://www.aaliance.eu/public/documents/aaliance-roadmap/.

[38] D. Traum and S. Larsson, The Information State Approachto Dialogue Management, in: Current and New Directions in


Discourse & Dialogue, Kluwer Academic Publishers, Smithand Kuppevelt, 2003, pp. 325–353.

[39] W. Wayne, Understanding spontaneous speech: the Phoenixsystem, in: Proc of International Conference on Acoustics,Speech and Signal Processing (ICASSP-91), 1991, pp. 365–367.

[40] W.C. Yau, D.K. Kumar and S.P. Arjunan, Visual Recogni-tion of Speech Consonants using Facial Movement Features,Integrated Computer-Aided Engineering 14(1) (2007), 49–61.

Copyright of Integrated Computer-Aided Engineering is the property of IOS Press and its content may not be

copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written

permission. However, users may print, download, or email articles for individual use.

Documents

IOS Press Talking Agents: A distributed architecture for ... · ORACULOS, and the results have been the basis for considering the evolution and application of the Talking Agent architecture