A case based reasoning model for multilingual language generation in dialogues

Embed Size (px)

Text of A case based reasoning model for multilingual language generation in dialogues

  • ng


    guaxpreh isanthe

    (Graesser, Chipman, Haynes, & Olney, 2005), conversational gamestesting emotional abilities (Rehm & Wis

    n a prochsmubig tant, andugh thLG pr

    drawback that it is not reusable. Templates have got a more ab-stract view generating Natural Language (NL), mixing xed textwith variable text. An example of a classical system using this ap-proach is ELIZA (Weizenbaum, 1966), which inserts part of the userinput in the system answers to simulate the process of a psycho-therapist doing a therapy. This is a more general NLG technique,

    tional agents is AIML (Wallace, 2000). It is based on stimulus

    operators joining these sentences in their parent nodes. The algo-rithm works crossing and mutating these operators, creating newsentences.

    ProtoPropp (Gervas, Diaz-Agudo, Peinado, & Hervas, 2005) is astory plot generation program which uses a CBR technique to builda story from an initial description of its plot, using Propp functionsto organize the tale and an ontology to capture the domain entitiesand to set all the relevant entities for the generation task. Cases arecomplete plot tales composed by related movements. A movementis a kind of procedure related to several Propp functions that allows

    Corresponding author.E-mail addresses: victor@decsai.ugr.es (V. Lpez Salazar), eisman@decsai.ugr.es

    (E.M. Eisman Cabeza), castro@decsai.ugr.es (J.L. Castro Pea), zurita@decsai.ugr.es

    Expert Systems with Applications 39 (2012) 73307337

    Contents lists available at

    Expert Systems w

    .e(J.M. Zurita Lpez).(Reiter & R, 1997), there is not a standard technique to do it be-cause it depends in many ways on the selected problem domain.Basically, there are three approaches to tackle the NLG problem;which are in ascending order of complexity and generality: cannedtext, templates, and symbolic approaches employing knowledgerepresentations at different linguistic levels and rules to manipu-late them. Canned text has the advantage of being a simple ap-proach; it only needs the nal text to be generated, but has the

    One proposal developing this scheme is given by Kimura andKitamura (2006) which extends the AIML language allowing toincorporate SPARQL queries to extract sentences from web pagesannotated with RDF, making the agent more dynamic. Lim andCho (2005) use a genetic programming algorithm to make the an-swers of a conversational agent more varied using Sentence PlanTrees (SPT) elements which contain the structure of the answer.SPT are binary trees containing templates in their leaves and jointagents to simulate different roles i(Kopp, Gesellensetter, Kramer, & Wacarry out, in a broad view, threeUnderstanding, Dialogue Managemeeration (NLG). For the last one, althotween the global subtasks that a N0957-4174/$ - see front matter 2012 Elsevier Ltd. Adoi:10.1016/j.eswa.2012.01.085sner, 2005), embodiedfessional environmentth, 2005). These agentssks: Natural LanguageNatural Language Gen-ere is an agreement be-ocess should carry out

    response scheme for answer generation, using pattern matchingfor recognizing the user input and templates for natural languagegeneration. The semantics of the contents of the agents answercould be specied as two simple string tags, by the topic andthat tags. This scheme is clearly insufcient to establish thecontents of the answer, because the agent cannot say what hewants but only the matching answer to a user input.ties where these systems could be employed, e.g. tutoring systemswhich give curricular advice or lessons about a particular matter

    Conversational agents usually focus their NLG methodology onthe use of templates. A well known language to develop conversa-A case based reasoning model for multili

    Vctor Lpez Salazar , Eduardo M. Eisman Cabeza, JDept. Computer Science and Articial Intelligence, ETSIIT, University of Granada, C/Perio

    a r t i c l e i n f o

    Keywords:DialogueSpeech actsConversational AgentsNatural Language Generation

    a b s t r a c t

    The process of Natural Language to its surface form eReasoning technique whicgenerates coherent phrasesmaintains a dialogue with

    1. Introduction

    The industry has got a growing interest in natural languageinterfaces which make possible that users easily interact in a nat-ural way with the devices they use. These interfaces are usuallyEmbodied Conversational Agents (ECAs) or, in general, Conversa-tional Systems. In the educational eld, there are many opportuni-

    journal homepage: wwwll rights reserved.ual language generation in dialogues

    n Luis Castro Pea, Jose Manuel Zurita LpezDaniel Saucedo Aranda, s/n, Granada, Spain

    ge Generation for a Conversational Agent translates some semantic lan-ssed in natural language. In this paper, we are going to show a Case Basedeasily extensible and adaptable to multiple domains and languages, thatd produces a natural outcome in the context of a Conversational Agent thatuser.

    2012 Elsevier Ltd. All rights reserved.

    because it does not need to pre-generate all the system answers,although these templates could not be reused in other situationsthat those for which they have been initially created. The last ap-proach usually employs linguistic knowledge as grammars or rhe-torical operators (Mann & Thompson, 2005) to describe the part ofthe language used by the system making it more generic, althoughthis leads to raising the complexity of the system.

    SciVerse ScienceDirect

    ith Applications

    lsevier .com/locate /eswa

  • s witfollowing the history until a certain point, i.e. until an objective inthe plot has been reached. The CBR module retrieves a similar casefrom the input querywhich contains a partial description of the taleand the retrieved case is adapted doing substitutions with therestrictions of the input query, searching for compatiblemovementsin the ontology to adapt the case. Later, the resulting structure ispassed to an NLG module which selects the content to be includedin the tale discarding the already presented information, structuresthe discourse establishing a presentation order between the facts ofthe tale based on priorities, makes an aggregation of facts talkingabout the same subject or the same action, selects the word torepresent the salient entities of the tale, and expresses the plot ina natural language instantiating the templates.

    Mairesse and Walker (2010) describe PERSONAGE as a psycho-logically motivated, parameterizable natural language generatorable to produce texts with different styles associated with aspectsof personality. The system is composed by several modules: a con-tent planner which selects the content of the sentences and struc-tures the discourse using rhetorical operators, a sentence plannerwhich species how the sentences have to be arranged and a reali-zationmodule which produces the nal text. Each of these modulesholds parameters controlling someaspects of the type of text to gen-erate, e.g. for the content planning, twelve parameters inuence thesize of the content plan, the content ordering, the used rhetoricalrelations, and the polarity of the expressed propositions.

    There have been more approaches related with the automaticgeneration of the language. Daniel S. Paiva carries out a survey of19 NLG systems (Paiva, 1998), most of them using symboliclanguage representations. From this survey and the other revisedliterature, we can state the following:

    Symbolic conversational systems must have a semantic level,like the entities of the ontology describing the domain of talesin the ProtoPropp system and a lexical level describing thedomain entities in natural language.

    A symbolic view of NLG using linguistic knowledge, e.g. gram-mars or any other knowledge representation about somelinguistic level like rhetorical operators to structure the dis-course, is hard to adapt from one domain to another becausethe operators used in one linguistic level for a given domaincould not be applicable to this same level of another domain.

    A symbolic view of an NLG system requires an appreciable the-oretic knowledge about linguistics and some methodology tomake it successfully applicable.

    If grammatical rules are employed in any level to capture syn-tactic structure, it is necessary to adapt the rules for eachrequired language.

    Usually, although not always, symbolic approaches based intransformations or grammars produce texts having a veryclosed style, turning these techniques hard to use in conversa-tional agents employed for dialogues that must be believable.

    In many cases, the desirable objectives for NLG in a conversa-tional agent are to keep the employed methodology as simple aspossible, without having a deep linguistic knowledge embeddedinto the system while maintaining an acceptable generality de-gree, i.e., the conversational agent should use some kind of prede-ned answers and variations of them, when the contents of theanswer and the intentions of the agent were similar to those ofa human being in the same situation. Usually, computational lin-guistics human experts are expensive or unavailable resourcesand the effort of building an NLG module for a conversationalagent should be minimized. If the employed technique is generic

    V. Lpez Salazar et al. / Expert Systemenough, this module could be reused in other agents devoted todifferent domains and languages without developing another spe-cic NLG system for each domain and language.We are interested in developing an NLG system used for dia-logues which does not need a deep linguistic knowledge and whichcould be reused in many domains without changing the systemfunctionality. Our work takes the ideas by Searle about speech acts(Searle, 1969), that have been later implemented and derived in sev-eral theories and conversational systems (Stent, 2002, 1995).S