10
Three systems that generate real-time natural lan- guage commentary on the RoboCup simulation league are presented, and their similarities, differ- ences, and directions for the future discussed. Although they emphasize different aspects of the commentary problem, all three systems take sim- ulator data as input and generate appropriate, expressive, spoken commentary in real time. H ere we present three RoboCup simula- tion league commentator systems: (1) ROCCO from DFKI, (2) BYRNE from Sony CSL, and (3) MIKE from ETL. Together, the three systems won the scientific award at RoboCup- 98 (Asada 1998) for making a significant and innovative contribution to RoboCup-related research. Soccer is an interesting test domain because it provides a dynamic, real-time environment in which it is still relatively easy for tasks to be classified, monitored, and assessed. Moreover, a commentary system has severe time restric- tions imposed by the flow of the game and is thus a good test bed for research into real-time systems. Finally, high-quality logs of simulated soccer games are already available and allow us to abstract from the intrinsically difficult task of low-level image analysis. From Visual Data to Live Commentary The automated generation of live reports on the basis of visual data constitutes a multistage transformation process. In the following sub- sections, we describe how the maintainable subtasks transform the input into the final out- put. The Input All three commentary systems concentrate on the RoboCup simulator league, which involves software agents only (as opposed to the real robot leagues). Thus, the soccer games to be commented are not observed visually. Rather, all three systems obtain their basic input data from the SOCCER SERVER (Kitano et al. 1997). All three commentator systems use as input the same information that the monitor pro- gram receives for updating its visualizations. This information consists of player location and orientation (for all players), ball location, and game score and play modes (such as throw ins, goal kicks). Game Analysis Game analysis provides an interpretation of the input data and aims at recognizing conceptual units at a higher level of abstraction. The infor- mation units resulting from such an analysis encode a deeper understanding of the time- varying scene to be described. They include spatial relations for the explicit characteriza- tion of spatial arrangements of objects as well as representations of recognized object move- ments. Scene interpretation is highly domain specific. Because we are dealing with soccer games, it is often possible to infer higher-level concepts from the observed motion patterns of the agents. Topic Control and Content Selection The next stage is to select the information to be communicated to the user. This selection depends on a number of constraints, such as the game situation, the purpose of the com- ment, the user’s information needs, and the available presentation media. Because the commentary output is basically speech, we also have to consider that only one thing can be said at a time. In addition, the whole utter- ance should be consistent and well planned. Articles SPRING 2000 57 Three RoboCup Simulation League Commentator Systems Elisabeth Andrè, Kim Binsted, Kumiko Tanaka-Ishii, Sean Luke, Gerd Herzog, and Thomas Rist Copyright © 2000, American Association for Artificial Intelligence. All rights reserved. 0738-4602-2000 / $2.00

Articles Three RoboCup Simulation League Commentator Systemsflint/papers/aimag00.pdf · piped to a speech synthesis module. To get more natural speech output, these systems do

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Articles Three RoboCup Simulation League Commentator Systemsflint/papers/aimag00.pdf · piped to a speech synthesis module. To get more natural speech output, these systems do

� Three systems that generate real-time natural lan-guage commentary on the RoboCup simulationleague are presented, and their similarities, differ-ences, and directions for the future discussed.Although they emphasize different aspects of thecommentary problem, all three systems take sim-ulator data as input and generate appropriate,expressive, spoken commentary in real time.

Here we present three RoboCup simula-tion league commentator systems: (1)ROCCO from DFKI, (2) BYRNE from Sony

CSL, and (3) MIKE from ETL. Together, the threesystems won the scientific award at RoboCup-98 (Asada 1998) for making a significant andinnovative contribution to RoboCup-relatedresearch.

Soccer is an interesting test domain becauseit provides a dynamic, real-time environmentin which it is still relatively easy for tasks to beclassified, monitored, and assessed. Moreover,a commentary system has severe time restric-tions imposed by the flow of the game and isthus a good test bed for research into real-timesystems. Finally, high-quality logs of simulatedsoccer games are already available and allow usto abstract from the intrinsically difficult taskof low-level image analysis.

From Visual Data to Live Commentary

The automated generation of live reports onthe basis of visual data constitutes a multistagetransformation process. In the following sub-sections, we describe how the maintainablesubtasks transform the input into the final out-put.

The InputAll three commentary systems concentrate on

the RoboCup simulator league, which involvessoftware agents only (as opposed to the realrobot leagues). Thus, the soccer games to becommented are not observed visually. Rather,all three systems obtain their basic input datafrom the SOCCER SERVER (Kitano et al. 1997).

All three commentator systems use as inputthe same information that the monitor pro-gram receives for updating its visualizations.This information consists of player locationand orientation (for all players), ball location,and game score and play modes (such as throwins, goal kicks).

Game AnalysisGame analysis provides an interpretation of theinput data and aims at recognizing conceptualunits at a higher level of abstraction. The infor-mation units resulting from such an analysisencode a deeper understanding of the time-varying scene to be described. They includespatial relations for the explicit characteriza-tion of spatial arrangements of objects as wellas representations of recognized object move-ments. Scene interpretation is highly domainspecific. Because we are dealing with soccergames, it is often possible to infer higher-levelconcepts from the observed motion patterns ofthe agents.

Topic Control and Content SelectionThe next stage is to select the information tobe communicated to the user. This selectiondepends on a number of constraints, such asthe game situation, the purpose of the com-ment, the user’s information needs, and theavailable presentation media. Because thecommentary output is basically speech, wealso have to consider that only one thing canbe said at a time. In addition, the whole utter-ance should be consistent and well planned.

Articles

SPRING 2000 57

Three RoboCup Simulation League

Commentator SystemsElisabeth Andrè, Kim Binsted, Kumiko Tanaka-Ishii,

Sean Luke, Gerd Herzog, and Thomas Rist

Copyright © 2000, American Association for Artificial Intelligence. All rights reserved. 0738-4602-2000 / $2.00

Page 2: Articles Three RoboCup Simulation League Commentator Systemsflint/papers/aimag00.pdf · piped to a speech synthesis module. To get more natural speech output, these systems do

bodied speech, BYRNE uses a face as an addition-al means of communication.

ROCCO

The ROCCO commentator system is a reincarna-tion of an early research prototype—called SOC-CER—that was built by Andrè, Herzog and Rist(1988) in the late 1980s for the automatedinterpretation and natural language descrip-tion of time-varying scenes. At that time, shortsections of video recordings of soccer gameswere chosen as a major application domainbecause they offered interesting possibilities forthe automatic interpretation of visually ob-served motion patterns in a restricted domain.Later, the work on incremental scene interpre-tation (Herzog and Rohr 1995; Herzog andWazinski 1994; Herzog et al. 1989) was com-bined with an approach for plan-based multi-media presentation design (Andrè et al. 1993;Andrè and Rist 1990) to move toward the gen-eration of various multimedia reports, such astelevision-style reports and illustrated newspa-per articles (Andrè, Herzog, and Rist 1994). Partof this extension was a generic architecture formultimedia reporting systems that describesthe representation formats and the multistagetransformation process from visual data tomultimedia reports. Both systems, the earlySOCCER as well as the new ROCCO system, can beseen as instantiations of this architecture. In

Natural Language GenerationOnce the content of the next utterance is decid-ed, the text realization is performed, whichcomprises grammatical encoding, linearization,and inflection. Commentary is a real-time livereport—information must be communicatedunder time pressure and in a rapidly changingsituation. The system should choose whether tomake short telegram-style utterances or gram-matically complete and explanatory utterancesaccording to the situation. Also, it can some-times be necessary for the commentator tointerrupt itself. For example, if an importantevent (for example, a goal kick) occurs, utter-ances should be interrupted to communicatethe new event as soon as possible.

The OutputFinally, the natural language utterances arepiped to a speech synthesis module. To getmore natural speech output, these systems donot rely on the default intonation of thespeech synthesizer but generate specific syn-thesizer instructions for expressing the corre-sponding emotions. These emotions, whetherhard wired into the templates or generated inaccordance with an emotional model and thestate of the game, can also be shown in thefacial expressions of an animated commenta-tor character.

Although MIKE and ROCCO produce disem-

Articles

58 AI MAGAZINE

Figure 1. Connecting SOCCER SERVER and ROCCO.

Clients (Team A)

Clients (Team B)

IncrementalEvent

Recognition

Upcoming3D-Visualization

Systems

VerbalComments

PlusVisualization

Report Generator

Text Generator

Graphics Generator

Animation Generator

Soccer2D Monitor

RoCCo

Rep

ort

Plan

ner

MessageBoard

FieldSimulator

SoccerServer

Page 3: Articles Three RoboCup Simulation League Commentator Systemsflint/papers/aimag00.pdf · piped to a speech synthesis module. To get more natural speech output, these systems do

the following discussion, we briefly sketch ROC-CO’s main components and the underlyingapproaches for incremental event recognitionand report generation.

In contrast to the early SOCCER system, ROCCO

does not have to tackle the intrinsically diffi-cult task of automatic image analysis. Insteadof using real image sequences and startingfrom raw video material, it relies on the real-time game log provided by the SOCCER SERVER

(figure 1). Based on these data, ROCCO’s incre-mental event-recognition component per-forms a higher-level analysis of the scene underconsideration. The representation and auto-matic recognition of events is inspired by thegeneric approach described in Herzog andWazinski (1994), which has been adopted inROCCO according to the practical needs of theapplication context. Declarative event con-cepts represent a priori knowledge about typi-cal occurrences in a scene. These event con-cepts are organized into an abstractionhierarchy. Event concepts, such as locomotionsor kicks, are found at lower levels of the hierar-chy because they are derived through the prin-ciple of specialization and temporal decompo-sition of more complex event concepts such asa ball transfer or an attack. Simple recognitionautomata, each of which corresponds to a spe-cific concept definition, are used to recognizeevents on the basis of the underlying scenedata.

The recognized occurrences, along with theoriginal scene data, form the input for reportgeneration. To meet the specific requirementsof a live report, ROCCO’s report planner relies onan incremental discourse planning mecha-nism. Assuming a discrete-time model, at eachincrement of a time counter, the systemdecides which events have to be communicat-ed next. Thereby, it considers the salience ofevents and the time that has passed since theiroccurrence. If there are no interesting events,the system randomly selects background infor-mation from a database.

The text generator is responsible for trans-forming the selected information into spokenutterances. Because it is rather tedious to spec-ify soccer slang expressions in existing gram-mar formalisms, we decided to use a template-based generator instead of fully fledged naturallanguage generation components, as in the ear-lier SOCCER system. That is, language is generat-ed by selecting templates consisting of stringsand variables that will be instantiated with nat-ural language references to objects delivered bya nominal-phrase generator. To obtain a richrepertoire of templates, 13.5 hours of televisionsoccer reports in English have been transcribed

and annotated. Templates are selected consid-ering parameters such as available time, bias,and report style. For example, short templates,such as “Meier again,” are preferred if the sys-tem is under time pressure. Furthermore, ROCCO

maintains a discourse history to avoid repeti-tions and track the center of attention. Forexample, a phrase such as “now Miller” shouldnot be uttered if Miller is the current topic. Forthe synthesis of spoken utterances, ROCCO relieson the TRUETALK (Entropic 1995) text-to-speechsoftware. To produce more lively reports, itannotates the computed strings of words withintonational markings considering the syntac-tic structure and the content of an utterance aswell as the speaker’s emotions. Currently, itmainly varies pitch accent, pitch range, andspeed. For example, excitement is expressed bya higher talking speed and pitch range.

Figure 2 shows a screen shot of the initialJAVA-based ROCCO prototype that was taken dur-ing a typical session with the system. The mon-itor program provided with the RoboCup simu-lation environment is used to play back thepreviously recorded match in the upper rightwindow. The text output window on the left-hand side contains a transcript of the spokenmessages that have been generated for the sam-ple scene. ROCCO does not always generategrammatically complete sentences but primari-ly produces short telegram-style utterances thatare more typical of live television reports. Thetest-bed character of the ROCCO system providesthe possibility of experimenting with variousgeneration parameters, such as language style,which can be set in the lower right window.

BYRNE

Now we present early work on an animatedtalking head commentary system called BYRNE

(figure 3). This system takes the output fromthe RoboCup soccer simulator and generatesappropriate affective speech and facial expres-sions (figure 4) based on the character’s person-ality, emotional state, and the state of play(Binsted 1998).

BYRNE can use any modular game-analysissystem as its input module. For RoboCup-98,however, BYRNE used a simple but effective play-by-play game-analysis system designed for thecompetition. This input module producesremarks much faster than BYRNE is capable ofsaying them. To compensate for this overabun-dance of things to say, the input module feedsits remarks into a priority queue. Each remarkhas a birthday (the time when it was enteredinto the queue), a deadline (a time beyondwhich it is “old news”), and a priority. When

… excitement isexpressed by a highertalking speed andpitch range.

Articles

SPRING 2000 59

Page 4: Articles Three RoboCup Simulation League Commentator Systemsflint/papers/aimag00.pdf · piped to a speech synthesis module. To get more natural speech output, these systems do

Both emotion generation and behavior gen-eration are influenced by the static characteris-tics of the commentator character. This is a setof static facts about the character, such as itsnationality and the team it supports. It is usedto inform emotion and behavior generation,allowing a character to react in accordancewith its preferences and biases. For example, ifa character supports the team that is winning,its emotional state is likely to be quite differentthan if it supports the losing team.

Emotion structures and static characteristicsare preconditions to the activation of high-lev-el emotion-expressing behaviors. These, inturn, decompose into lower-level behaviors.The lowest-level behaviors specify how the textoutput by the text-generation system is to bemarked up.

Emotionally motivated behaviors are orga-nized in a hierarchy of mutually inconsistent

BYRNE requests a new fact to say, the queuereturns a fact using a simple priority-schedul-ing algorithm.

The information provided by the analysissystem also drives the emotional model. Theemotion-generation module contains rules thatgenerate simple emotional structures. Thesestructures consist of a type, for example, happi-ness, sadness; an intensity, scored from 1 to 10;a target (optional); a cause, that is, the factabout the world that caused the emotion tocome into being; and a decay function, whichdescribes how the intensity of the emotiondecays over time.

An emotion structure–generation rule consistsof a set of preconditions, the emotional struc-tures to be added to the emotion pool, and theemotional structures to be removed. The precon-ditions are filled by matching on the currentlytrue facts about the world and the character.

Articles

60 AI MAGAZINE

Figure 2. The Basic Windows of the Initial ROCCO System.

Page 5: Articles Three RoboCup Simulation League Commentator Systemsflint/papers/aimag00.pdf · piped to a speech synthesis module. To get more natural speech output, these systems do

groups. If two or more activated behaviors areinconsistent, the one with the highest activa-tion level is performed, which usually results inthe strongest emotion being expressed. Howev-er, a behavior that is motivated by several dif-ferent emotions might win out over a behaviormotivated by one strong emotion.

Emotions are expressed by adding mark up toalready-generated text. Text generation is donesimply through a set of templates. Each tem-plate has a set of preconditions that constrainthe game situations they can be used todescribe. If more than one template matchesthe chosen content, then the selection is basedon how often and how recently the templateshave been used. BYRNE’s text-generation moduledoes not generate plain text but, rather, textmarked up with SEEML (speech, expression, andemotion mark-up language). Linguisticallymotivated facial gestures and speech intonationare currently hard coded into the templates.

SEEML is a slightly supplemented superset ofthree different mark-up systems, namely FAC-SML (a variant on FACS [Ekman and Friesen1978]), SABLE (Sproat et al. 1997), and GDA

(Nagao and Hasida 1998). GDA is used to informlinguistically motivated expressive behaviorsand also to aid the speech synthesizer in gener-ating appropriate prosody. FACSML is used to addfacial behaviors, and SABLE is used to control thespeech synthesis.

The expression mark-up module adds emo-tionally motivated mark up to the alreadymarked-up text from the text-generation sys-tem. Conflicts are resolved in a simple (perhapssimplistic) manner. Unless identical, tags areassumed to be independent. If two identicaltags are assigned to a piece of text, the one withthe smaller scope is assumed to be redundantand removed. Finally, if two otherwise identi-cal tags call for a change in some parameter, itis assumed that the change is additive.

The marked-up text is then sent to the SEEML

parser. The parser interprets SEEML tags in thecontext of a style file, adds time markers andlip-synching information, and sends appropri-ate FACS to the facial-animation system andSABLE to the speech-synthesis system. The resultis an emotional, expressive talking-head com-mentary on a RoboCup simulation league soc-cer game.

MIKE

MIKE (multiagent interactions knowledgeablyexplained) is an automatic real-time commen-tary system capable of producing output inEnglish, Japanese, and French.

One of our contributions is to demonstrate

Articles

SPRING 2000 61

Figure 3. The BYRNE System Architecture.

Text Generation

Content Selection

Game Analysis

Soccer Server

Emotion Generation

Behavior Generation

Text Markup

BYRNE

SpeechSynthesis

FaceAnimation

Figure 4. BYRNE Sneering.

Page 6: Articles Three RoboCup Simulation League Commentator Systemsflint/papers/aimag00.pdf · piped to a speech synthesis module. To get more natural speech output, these systems do

tasks. Notably, these modules demonstrate thegeneral applicability of analyzing the focus of amultiagent system and examining the territo-ries established by individual agents.

Another technical challenge involved in live

that a collection of concurrently runninganalysis modules can be used to follow andinterpret the actions of multiple agents (Tana-ka-Ishii et al. 1998). MIKE uses six SOCCER ANALYZ-ER modules, three of which carry out high-level

Articles

62 AI MAGAZINE

Figure 5. MIKE’s Structure.

NLGenerator

InferenceEngine

TTSAdministrator

TechniquesShoot

Statistic

Voronoi Basic

Bigram

Communicator

MIKE

generatorbuffer

propositionpool

shared

memory

SoccerServer

TTS (Text-To-Speech)Software

Page 7: Articles Three RoboCup Simulation League Commentator Systemsflint/papers/aimag00.pdf · piped to a speech synthesis module. To get more natural speech output, these systems do

commentary is the real-time selection of con-tent to describe the complex, rapidly unfoldingsituation (Tanaka-Ishii, Hasida, and Noda1998). Soccer is a multiagent game in whichvarious events happen simultaneously in thefield. To weave a consistent and informativecommentary on such a subject, an importancescore is put on each fragment of commentarythat intuitively captures the amount of infor-mation communicated to the audience. Theinference and the content-selection modulesare both controlled by such importance scores.

From the input sent by the SOCCER SERVER,MIKE creates a commentary that can consist ofany combination of the possible repertoire ofremarks. The commentary generation is coor-dinated by the architecture shown in figure 5,where the ovals represent processes, and therectangles represent data. The repertoire isshown in figure 6; an English example of MIKE’scommentary is shown in figure 7.

There are six SOCCER ANALYZER modules, ofwhich three analyze basic events (shown in thefigure as the basic, techniques, and shootprocesses), and the other three carry out morehigh-level analysis (shown as the bigram,Voronoi, and statistic processes). These sixprocesses analyze the information posted tothe shared memory by the communicator,communicate with each other by the sharedmemory, and also post propositions to theproposition pool.

The bigram module follows and analyzes theball plays as a first-order Markov chain that is afocus point in soccer game. A 2 � 24 ball-playtransition matrix (22 players and 2 goals) isautomatically formed during the game andused to describe the activity of each player andidentify successful passwork patterns. Usingthis matrix, MIKE calculates and examines theplayers’ ball-play performance, such as passsuccess rates, winning passwork patterns, andnumber of shots.

The Voronoi module calculates Voronoi dia-grams for each team every 100 milliseconds.Using these partitions, one can determine thedefensive areas covered by players and alsoassess overall positioning. Figure 8 shows anexample of such a Voronoi diagram (+ and dia-mond indicate players of each team; box showsthe ball.).

In the future, these rich state-analysis mod-ules will be separated from the rest of the sys-tem into a “preMIKE” module so that simulationsoccer teams or other commentary systems candynamically refer to the analysis results.

All interprocess communication is done byhandling an internal representation of com-mentary fragments, represented as proposi-

tions. These propositions consists of a tag andsome attributes. For example, a kick by player5 is represented as (Kick 5), where Kick is thetag, and 5 is the attribute.

To establish the relative significance ofevents, importance scores are put on proposi-tions. After being initialized in the ANALYZER

MODULE, the score decreases over time while itremains in the pool waiting to be uttered.When the importance score of a propositionreaches zero, it is deleted from the pool.

Propositions deposited in the pool are oftentoo low level to be directly used to generatecommentary. The INFERENCE ENGINE MODULE

processes the propositions in the pool with acollection of over–forward-chaining inferencerules. The rules can be categorized into fourclasses, based on the relation of consequencesto their antecedents: (1) logical consequences,(2) logical subsumption, (3) second-order rela-tions, and (4) state change. For example, (High-PassSuccessRate player) (PassPattern playerGoal) -> (active player) is a rule to make a logi-cal consequence.

Articles

SPRING 2000 63

Figure 6. MIKE’s Commentary Repertoire.

Explanation of Complex Events: This concerns form changes, positionchanges, and advanced plays.

Evaluation of Team Plays: This concerns average forms, forms at a certainmoment, players’ locations, indications of the active or problematic play-ers, winning passwork patterns, and wasteful movements.

Suggestions for Improving Play: These concern loose defense areas and betterlocations for inactive players.

Predictions: These concern passes, game results, and shots at goal.

Set Pieces: These concern goal kicks, throw ins, kick offs, corner kicks, andfree kicks.

Passwork: This tracks basic ball-by-ball plays.

Figure 7. Example of MIKE’s Commentary.

Red3 collects the ball from Red4, Red3, Red-Team, wonderful goal! 2 to 2!Red3’s great center shot! Equal! The Red-Team’s formation is now breakingthrough enemy line from center, The Red-Team’s counter attack (Red4 nearat the center line made a long pass towards Red3 near the goal and he madea shot very swiftly.), Red3’s goal! Kick off, Yellow-Team, Red1 is very activebecause, Red1 always takes good positions, Second half of Robocup-97quater final(Some background is described while the ball is in the midfield.) . Left is Ohta Team, Japan, Right is Humboldt, Germany, Red1 takesthe ball, bad pass, (Yellow team’s play after kick off was interrupted by Readteam) Interception by the Yellow-Team, Wonderful dribble, Yellow2, Yel-low2 (Yellow6 approaches Yellow2 for guard), Yellow6’s pass, A passthrough the opponents’ defense, Red6 can take the ball,because, Yellow6 isbeing marked by Red6, .... The Red-Team’s counter attack, The Red-Team’sformation is (system’s interruption), Yellow5, Back pass of Yellow10, Won-derful pass,

Page 8: Articles Three RoboCup Simulation League Commentator Systemsflint/papers/aimag00.pdf · piped to a speech synthesis module. To get more natural speech output, these systems do

tance scores during the game. Thus, the con-tent selection is integrated in the surface-real-ization module, which accounts for interrup-tion, abbreviation, and so on.

Discussion and Future WorkEven though our focus has been on the auto-matic description of soccer matches, none of

Finally, the NATURAL LANGUAGE GENERATOR

selects the proposition from the propositionpool that best fits the current state of the gameand then translates the proposition into natur-al language. It broadcasts the current subject tothe analyzers, so that they assign higher initialimportance values to propositions with relatedsubjects. Content selection is made with thegoal of maximizing the total gain of impor-

Articles

64 AI MAGAZINE

7

10

Figure 8. A Voronoi Diagram Example.

Analysis Methods forTopic Control

Natural LanguageGeneration

Output

ROCCO Recognitionautomata

Ranking basedon salience,informationgain, and age ofevents

Parameterizedtemplate selection+ noun phrasegeneration in realtime

Expressivespeech

BYRNE “Observers“recognize eventsand states

Prioritized bybirthday,deadline, andpriority scores

Templates, markedup for expressionand interruption inreal time

Expressivespeech andfacialanimation

MIKE Events and states(bigrams,Voronoi)

Importancescores andinference

Templates,interruption,abbreviation usingimportance scores

Expressivespeech

Table 1. A Comparison of the Features of the Three Commentator Systems.

Page 9: Articles Three RoboCup Simulation League Commentator Systemsflint/papers/aimag00.pdf · piped to a speech synthesis module. To get more natural speech output, these systems do

the presented approaches is restricted to thisclass of application. For example, the group atDFKI originally addressed the dynamic descrip-tion of traffic scenes (cf. Herzog and Wazinski[1994]). To port our approach to the domain ofsoccer, we didn’t have to change a single line inthe processing algorithms. Only our declara-tive knowledge sources, such as the event-recognition automata and the natural languagetemplates, had to be adapted.

At Sony CSL, we are currently looking atadapting our systems to commentate otherdomains, in particular, other sports (for exam-ple, baseball, American football). Also, in realsports commentary, commentators usuallywork in pairs. One is the play-by-play com-mentator, and the other provides “color” (thatis, background information and statistics aboutthe players and teams). We are interested inhaving our commentator systems mimic theseroles, taking turns providing relevant informa-tion in context.

DFKI is exploring new forms of commentaryas well. We are currently working on a three-dimensional visualization component toenable situated reports from the perspective ofa particular player.

As an application of MIKE, ETL is now devel-oping a real-time navigation system for pedes-trians, especially the blind, and also for auto-mobiles. This system, like MIKE, senses theposition of the target to be navigated, analyzesthe situation using static information as maps,controls topic, and then outputs the naviga-tion as expressive speech.

ConclusionsAlthough there are some differences in empha-sis (table 1), all three systems provide real-time,expressive commentary on the RoboCup simu-lation league. All were demonstrated atRoboCup-98, and we hope that improved ver-sions will be shown at future RoboCup events.The success of these systems shows thatRoboCup is not just a robot competition—it isa challenging domain for a wide range ofresearch areas, including those related to real-time natural language commentary genera-tion.

AcknowledgmentsWe would like to thank Dirk Voelz for his workon the implementation of ROCCO.

ReferencesAndrè, E., and Rist, T. 1990. Toward a Plan-Based Syn-thesis of Illustrated Documents. Paper presented atthe Ninth European Conference on Artificial Intelli-gence, 6–8 August, Stockholm, Sweden.

Andrè, E.; Herzog, G.; and Rist, T. 1994. MultimediaPresentation of Interpreted Visual Data, Report, 103,SFB 314–Project VITRA, Universitat des Saarlandes,Saarbrüken, Germany.

Andrè, E.; Herzog, G.; and Rist, T. 1988. On theSimultaneous Interpretation of Real-World ImageSequences and Their Natural Language Description:The System SOCCER. Paper presented at the EighthEuropean Conference on Artificial Intelligence, 1–5August, Munchen, Germany.

Andrè, E.; Finkler, W.; Graf, W.; Rist, T.; Schauder, A.;and Wahlster, W. 1993. WIP: The Automatic Synthesisof Multimodal Presentations. In Intelligent Multime-dia Interfaces, ed. M. T. Maybury, 75–93. Menlo Park,Calif.: AAAI Press.

Asada, M., ed. 1998. RoboCup-98: Robot SoccerWorld Cup II. Paper presented at the SecondRoboCup Workshop, 2–3, 9 July, Paris, France.

Binsted, K. 1998. Character Design for Soccer Com-mentary. Paper presented at the Second RoboCupWorkshop, 2–3, 9 July, Paris, France.

Ekman, P., and Friesen, W. V. 1978. Manual for theFacial-Action Coding System. Palo Alto, Calif.: Consult-ing Psychologists.

Entropic. TrueTalk Reference and Programmers’ Man-ual, Version 2.0, Entropic Research Laboratory Inc.,Washington, D.C.

Herzog, G., and Rohr, K. 1995. Integrating Vision andLanguage: Toward Automatic Description of HumanMovements. In KI-95: Advances in Artificial Intelli-gence. Nineteenth Annual German Conference on Artifi-cial Intelligence, eds. I. Wachsmuth, C.-R. Rollinger,and W. Brauer, 257–268. Berlin: Springer.

Herzog, G., and Wazinski, P. 1994. Visual Translator:Linking Perceptions and Natural Language Descrip-tions. AI Review 8(2–3): 175–187.

Herzog, G.; Sung, C.-K.; Andrè, E.; Enkelmann, W.;Nagel, H.-H.; Rist, T.; Wahlster, W.; and Zimmer-mann, G. 1989. Incremental Natural LanguageDescription of Dynamic Imagery. In WissensbasierteSysteme (Knowledge-Based Systems). 3. Int. GI-Kon-gre, 153–162. Berlin: Springer.

Kitano, H.; Asada, M.; Kuniyoshi, Y.; Noda, I.; Osawa,E.; and Matsubara, H. 1997. RoboCup: A ChallengeProblem for AI. AI Magazine 18(1): 73–85.

Nagao, K., and Hasida, K. 1998. Automatic Text Sum-marization Based on the Global Document Annota-tion. Technical report SCSL-TR-98-021, Sony Com-puter Science Laboratory, Tokyo, Japan.

Sproat, R.; Taylor, P.; Tanenblatt, M.; and Isard, A.1997. A Markup Language for Text-to-Speech Snythe-sis. Paper presented at EUROSPEECH’97, 22–25 Sep-tember, Rhodes, Greece.

Tanaka-Ishii, K.; Hasida, K.; and Noda, I. 1998. Reac-tive Content Selection in the Generation of Real-Time Soccer Commentary. Paper presented at COL-ING-98, 10–14 August, Montreal, Canada.

Tanaka-Ishii, K.; Noda, I.; Frank, I.; Nakashima, H.;Hasida, K.; and Matsubara, H. 1998. MIKE: An Auto-matic Commentary System for Soccer. Paper present-ed at the 1998 International Conference on Multia-gent Systems, 4–7 July, Paris, France.

Articles

SPRING 2000 65

Page 10: Articles Three RoboCup Simulation League Commentator Systemsflint/papers/aimag00.pdf · piped to a speech synthesis module. To get more natural speech output, these systems do

Kumiko Tanaka-Ishii(1991, University ofTokyo, B.Sc., appliedmathematics; 1993, Uni-versity of Tokyo, M.Sc.,concurrent program-ming; 1997, Universityof Tokyo, Ph.D.) is cur-rently a researcher at the

Electrotechnical Laboratory, Japan, whereher main research interests are natural lan-guage interface and concurrent systems.Her e-mail address is [email protected].

Sean Luke is a researchassistant in the Depart-ment of Computer Sci-ence at the University ofMaryland. Luke receiveda B.S. in computer sci-ence from BrighamYoung University and iscurrently finishing his

Ph.D. at the University of Maryland. Hisinterests include cooperative multiagentbehavior, massively distributed knowledgerepresentation, robotics, and the evolutionof neural systems. His e-mail address [email protected].

oratory at the GermanResearch Center for Arti-ficial Intelligence (DFKI)in Saarbrüken since1996. Before joiningDFKI, he was a projectmanager in the Comput-er Science Department atUniversit des Saarlandes,

where he was engaged in the long-termbasic research project VITRA (visual transla-tor) of the SFB 314, a special collaborativeprogram on AI and knowledge-based sys-tems, funded by the German Science Foun-dation. His research interests focus onintelligent user interfaces, in particular,vision and language integration. His e-mailaddress is [email protected].

Thomas Rist is a seniorscientist in the Depart-ment of Intelligent UserInterfaces at the GermanResearch Center for Arti-ficial Intelligence, with aresearch focus on adap-tive multimedia systemsand multimodal interac-

tion. He is currently a member of the Euro-pean i3 (intelligent information interfaces)Coordination Group and is in charge ofseveral managing tasks. His e-mail addressis [email protected].

Elisabeth Andrè is aprincipal researcher atthe German ResearchCenter for ArtificialIntelligence in Saar-brüken, Germany. She isthe chair of the ACL Spe-cial Interest Group onMultimedia Language

Processing. She is on the editorial board ofArtificial Intelligence Communications and isthe area editor for Intelligent User Inter-faces of the Electronic Transactions of Arti-ficial Intelligence. Her research interestsinclude multimedia communication, syn-thetic agents, and natural language inter-faces. Her e-mail address is [email protected].

Kim Binsted (Ph.D., AI, University of Edin-burgh, 1996; B.Sc., physics, McGill Univer-sity, 1991) has been an associate researcherat the Sony Computer Science Laboratoriesin Tokyo, Japan, since January 1998. Herfocus is on responsive entertainment andexpressive characters. Previously, she was aScience and Technology Agency Postdoc-toral Fellow at the Kansai AdvancedResearch Center (KARC-CRL) in Kobe,where she worked on computationalhumor in English and Japanese.

Gerd Herzog has been affiliated with theIntelligent Multimedia User Interfaces Lab-

Articles

66 AI MAGAZINE

Expertise in Context:Human and Machine

P a u l J . F e l t o v i c h

K e n n e t h M . F o r d

R o b e r t R . H o f f m a n

AAAI Press / The MIT Press 690 pp., $40.00., ISBN 0-262-56110-7

To order call 800-356-0343 (US and Canada) or(617) 625-8569.

Distributed by The MIT Press, 5 Cambridge Center, Cambridge, MA 02142