TelMeA: Effects of avatar-like agents on asynchronous community systems and implementation in a web-based system

TelMeA: Effects of Avatar-Like Agents on AsynchronousCommunity Systems and Implementation

in a Web-Based System

Toru Takahashi1,2 and Hideaki Takeda1,3

1Graduate School of Information Science, Nara Institute of Science and Technology, Ikoma, 630-0101 Japan

2ATR Media Integration and Communications Research Laboratories, Kyoto, 619-0288 Japan

3National Institute of Informatics, Tokyo, 101-8430 Japan

SUMMARY

The effects of using an animation agent (avatar-likeagent) as an interface of an asynchronous community sys-tem, TelMeA, a system with such an agent implemented bythe authors, and the results of test-running this system arediscussed in this paper. Psychological test results show thatthe agent not only has the effect of speaking simply for aspeaker, but also differentiating between multiple speakers.Thus, use of this agent allows easy understanding of thestate of human relations etc. within a community fromconversation content and an increased awareness of theconversation context. In addition, the agent can representnonlanguage expressions such as expressions, gestures,approaching, finger pointing, and the like by movements ona screen or animation. TelMeA is a system supportingasynchronous conversations by multiple persons throughexchanges of scripts describing multimodal behavior of theabove-mentioned expressions via agents displayed on aweb page. It was found from the results of test-runningTelMeA for 9 days that 17% of nonlanguage expressionswere observed in total utterances and that the functioningof TelMeA was evaluated highly by its users. © 2004 WileyPeriodicals, Inc. Electron Comm Jpn Pt 2, 87(7): 58–69,2004; Published online in Wiley InterScience (www.interscience.wiley.com). DOI 10.1002/ecjb.20102

Key words: community; agent; avatar; multimo-dal; CMC.

1. Introduction

Numerous web pages in multimedia form that com-bine images, dynamic images, music, dialogue forms, ap-plets, and the like, in addition to text and link information,are displayed by WWW (World Wide Web) services of theInternet. It is not an overstatement that easy modes ofconstruction, disclosure, and display of such a web pagesupport today’s prevalence of the Internet. In addition,information is also being vigorously exchanged using sys-tems such as BBS (Bulletin Board System, electronic dis-play template) and IRC (Internet Relay Chat, chat) on theweb owing to the spread of the WWW. Web communitiesformed by such systems are enjoying attention from variousareas such as engineering, psychology, economics, educa-tion, social activities, and the like, playing important rolesin the information age [1–3].

The authors have been focusing their attention on theroles and effects of such a web community in social infor-mation editing or circulation. Qualitatively improving thecontent of information by facilitating its exchange within acommunity is considered to be most important for improv-

© 2004 Wiley Periodicals, Inc.

Electronics and Communications in Japan, Part 2, Vol. 87, No. 7, 2004Translated from Denshi Joho Tsushin Gakkai Ronbunshi, Vol. J84-D-I, No. 8, August 2001, pp. 1244–1255

58

ing such social information editing and circulating func-tions of a web community. However, the existing text-basedcommunity systems have functional limitations in theirrepresentations.

In this paper, an avatar-like agent is proposed as aconversation interface complementing the text repre-sentation capability. An avatar-like agent is an interface forconducting asynchronous conversations in a web commu-nity in a form close to a face-to-face encounter. In addition,a community system that increases the utilizability of theweb as a database of multimedia information can be con-structed by increasing the affinity between web pages anda web community existing in separate locations by meansof avatar-like agent. The definition and significance of theavatar-like agent and TelMeA, a community system imple-menting this agent, are presented below.

2. The Avatar-Like Agent

2.1. Constraints of document communications

More utterances are exchanged in an asynchronouscommunity system such as a BBS or ML (Mailing List)than in a synchronous community system such as an IRC.This is because a participant in a BBS or ML has more timefor elaborating the content of an utterance by searching forthe data referenced. Thus, the history of utterances pro-duced on a BBS or ML is preserved by being recorded inthe usual log form. In this way, a BBS or ML that makesavailable a log of utterances assumes a role as a source ofinformation on the know-how and knowledge of the fieldof interest of such a community. The authors have stronginterests in the process of cooperation and spontaneousintelligent contribution to such a community in connectionwith the collection, evaluation, editing, exchanging, andstoring of information by each participant in such an asyn-chronous community.

However, communications on a BBS occur in theform of documents (text, or image-inputting text) in manycases. Since representations based on text impose smallloads on a network relative to their information level, withsimple and familiar inputs, they are highly valuable as aconversational interface of a community system. However,documents alone lack various representation capabilities ascompared with the usual face-to-face communications,supporting only very constrained awareness with respect tothe means of representation and locations of the communi-ties involved.

(1) Nonverbal body representations

In face-to-face communications, communicationsare done by exchanging nonverbal expressions such as body

gestures or expressions. Documents that cannot use suchnonverbal body expressions can easily produce misunder-standings due to arbitrariness of contextual understanding,resulting in formal circumlocutory representations to avoidsuch misunderstandings.

(2) Representation sharing and use of a “place(location)”

In actual face-to-face conversations, the place of ref-erences to objects by the people involved in the conversa-tions via line of sight or finger pointing, or referring to datacommonly visible to them, is highly visible. In addition, thedistance between the people involved in conversations alsohas contextual information in such face-to-face conversa-tions. However, in conversations based on documents, con-versing using the context of such a place or location isdifficult even if a record of the utterances of the other partycan be quoted.

(3) Representations of subjectivity

An utterance is made on the basis of a subject* calleda self. Thus, such subjectivity is directly represented by thebody of the self. However, utterances based on documentscannot express the subjectivity of a speaker consistently,although they can generally represent the intentions of aspeaker. Thus, differentiating between speakers is difficultor uncertain with only the sentence rhetoric of an utterance.

2.2. Conversation interface allowingsimulating representations

The above problems are resolved in this paper byusing a medium having “simulating representations” in-stead of text. Media having simulating representations in-clude avatars in a virtual space [4]. Avatars can beconsidered to be simulating agents used in the first person.A user of an avatar interacts with an object in a virtual spaceby conversing with other people synchronously through theavatar, with the avatar representing the subjectivity or thesocial existence of the user in the virtual space.

Focusing on the nonverbal representational capabil-ity using simulating representations or the subjectivity ofthe avatar, the authors consider using this capability as aconversation interface in asynchronous communications.Thus, the authors propose the concept of the avatar-likeagent in this paper. Here the avatar-like agent is defined tobe an animated interface agent capable of describing behav-ior by a script, used as a conversation interface betweenspeakers in an asynchronous conversation system. A usercan send messages by having an avatar-like agent present

*The subject here has an emphasis on individuality, implementability, andembodiment and means the self that recognizes, behaves, and evaluates.

59

representations of himself/herself that combine bodily mo-dalities consisting of movements on a screen, such as ges-tures, expressions, lines of sight, and finger pointing in ascript.

In the next section, the usefulness of the avatar-likeagent as a conversation interface is evaluated by psycho-logical tests.

3. Evaluation of Avatar-Like Agent

It has been reported [5] that the social behavior of asimulated interface agent is induced by a user in such a waythat social subjectivity is attributed to the agent [5]. Thesocial behavior associated with an utterance of the avatar-like agent is also considered to be directed to the reader ofthis utterance, so that the avatar-like agent has a certainsocial subjectivity (as a representative of the user or theagent’s own).

In order to study the effects of the subjectivity of theavatar-like agent on readers in the community, the authorsconducted tests with the readers of a text conversation log.

3.1. Tests

First, the subjects were divided into three groups andconversation logs of different form were displayed on com-puters. Each subject was given the instruction that “thisconversation is a regeneration of a log of conversationscarried out via a network,” before viewing the log.

Three people were involved in the conversation in thelog. The contents of the conversation consisted of twopeople recommending Russian food and Turkish food toone person wondering about what to eat for dinner. The

conditions of the conversation format for each group aredescribed below.

[Condition 1] The text of the conversation of the threepeople is scrolled continuously on a screen from top tobottom for about 130 seconds, with the speakers designated[Fig. 1(a)].

[Condition 2] Three avatar-like agents appear in thelower part of the screen and carry out a conversation usingsynthesized voice and breaking in. The agents speak with-out moving after their appearance. The entire conversationis played for over 150 seconds [Fig. 1(b)].

[Condition 3] A conversation by three avatar-likeagents is played for over 150 seconds as in Condition 2.However, the utterances are accompanied by animation,and gestures or expressions directed to the other agents areshown in combination with the contents of the utterance[Fig. 1(c)].

3.2. Questionnaire after testing

After viewing the conversation presented in accord-ance with each condition, the subject is instructed to re-spond to a questionnaire displayed on the same screen. Thequestionnaire is divided into a part related to the testingformat and a part related to the evaluation format. One ofthree choices is selected for each question on the testingformat. A rating on a 7-point scale (with a score of 4 beingneutral, 1 being the most negative, and 7 being the mostpositive) is given to the questions on the evaluation format.The questionnaire contents used in the analysis are asfollows:

• Memory tests on conversation contents (fourquestions)

(a) (left): Auto scrolling text style.(b) (center): Static avatar-like agents with voice.(c) (right): Animated avatar-like agents with voice.

Fig. 1. Examples of conversation presentations under three different conditions.

60

• Identification tests on conversation speakers(three questions)

• Evaluation of conversation contents or atmos-phere (three questions)

• Evaluation of own desire to participate in a con-versation (two questions)

3.3. Test results

Eighteen male and female subjects in their 20s par-ticipated in the tests; 6 in the Condition 1 tests, 5 in theCondition 2 tests, and 7 in the Condition 3 tests. All subjectshad been members of a mailing list, 14 of them had partici-pated in a BBS, and 12 had participated in chat rooms. Thetesting time was about 10 minutes for one subject. Table 1summarizes the data (means and standard deviations of allsubjects) obtained from the tests and the results of itsanalysis.

The results of testing the degree of memory of theconversation contents show no significant differences by

conditions (Table 1-I). However, in tests identifying thespeakers with respect to the contents of a conversation, thecorrect response rate tended to be significantly higher (p <0.10) for the interface of avatar-like agents than with onlytextual representation (Table 1-II), while the variance wasgreater under Condition 1 than for the textual conversationlog (Condition 1) and conversation playback by avatar-likeagents (Conditions 2 and 3 summed). In particular, a sig-nificant difference (p < 0.05) was observed when Condi-tions 1 and 3 were analyzed.

On the other hand, no significant differences wereobserved in the evaluation results with respect to interest inthe content of the conversation or the pleasantness of theconversation atmosphere (Table 1-III). But in response tothe question on the desire to participate in a conversation,the evaluations tended to be significantly higher (p < 0.10)for the avatar-like agent format (Table 1-IV).

3.4. Discussion of test results

Because the number of subjects was insufficient,statistically significant differences were not obtained foreach question item in the tests performed in this study. Inaddition, effects of the presence or absence of voice or thepositions of the agents have not been clarified. While rec-ognizing the above points, the following inferences may bedrawn from the analysis results.

3.4.1. Differentiation of individuals

Separate colors (red, blue, green) were attached to thenames of the speakers of the texts. However, many subjectsdid not recognize the speakers even though they remem-bered the content of the utterances. On the other hand, thesubjects of the avatar-like agent tests recognized the speak-ers (Table 1-II) even though their recall of the content of theutterances was similar to that of the subjects of the textualformat (Table 1-I).

This difference is attributed to the fact that a differ-ence with respect to identification of a speaker is alsocaused by a difference in the conversation interface. Ac-cording to the model [6] of human recognition level byYoung and colleagues, body characteristics such as the face,voice, etc. or information on professions, etc. precedenames in the recognition of a person whom one has metpreviously (Fig. 2).

According to this model, recognizing a speaker intui-tively is difficult since only names are indicated in the caseof a textual conversation log. On the other hand, a subjectof the avatar-like agent format tests is considered to recog-nize an agent as a social entity from its social behavior orsimulated representation. Thus, individual speakers areconsidered to be recognized intuitively with comparativeease from the characteristics of the agents and from theutterance contents.

Table 1. Questionnaire results (SD values inparentheses)

C1 C2 C3

I

No. of correctanswers (among 4questions) ofconversationcontents memorytests

2.8 3.4 3.2

(0.2) (1.4) (0.7)

no significant differences

II

No. of correctanswers (among 3questions) ofconversationspeakeridentification tests

1.3 1.6 2.4

(0.8) (1.4) (0.8)

significant tendency C1 < [C2 + C3]

significant differences C1 < C3

III

Evaluation ofconversationcontents andatmosphere (7categories)

5.4 (0.9) 5.3 (1.1) 5.5 (1.9)

no significant differences

IV

Evaluation ofinterest inparticipating inconversation (7categories)

3.3 (1.4) 5.6 (1.0) 5.1 (2.1)

significant tendency C1 < [C2 + C3]

C1: Auto scrolling text style.C2: Static avatar-like agents with voice.C3: Animated avatar-like agents with voice.

61

In the case of static avatar-like agents, the standarddeviation of the number of correct responses to all threequestions regarding the identification of speakers was 1.4,indicating an extreme difference. This difference may beattributed to the difference associated with whether an agentis viewed as a social entity or as a simple icon of additionalinformation.

3.4.2. Understanding human relations

The history of the content uttered through an agent iseasily remembered as the biographic information of a user(or an agent itself) by individual avatar-like agents recog-nized as individuals with subjectivity. In addition, the pro-file and human relations of a speaker are easily understoodfrom the content of utterances of individual speakers asclues.

The subjects of the tests under the conditions of atextual conversation log showed lower interest in participat-ing in a conversation than the subjects of the agent tests,despite their same evaluations with respect to the conversa-tion contents or atmosphere (Table 1-IV). The reason ap-pears to be that the interface of avatar-like agents is freshand captures the subjects’ attention, as well as that humanrelations in a community are difficult to understand fromtextual conversations, which interferes with participation insuch a situation.

Young’s model shows that differentiating speakersintuitively while reading conversations on a BBS is difficulteven if it is assumed that all participants are using the same“handle” name consistently. In reality, in a BBS based ontexts, differentiating individual speakers is difficult, theintentions of a speaker may be misunderstood until thehuman relations are understood, and hesitation to write ina comment may occur for fear of being misunderstood. Theflow of a conversation or the hidden intentions of a speakerbecome easily understandable by differentiating individu-als and understanding human relations. Thus, it is consid-ered that in a system in which human relations are easily

understood, not only is misunderstanding reduced com-pared with a BBS, but also participation in a conversationby people becomes easier.

3.5. Conclusions

Conversations using avatar-like agents help theirreaders to differentiate individual speakers and understandhuman relations, or the person or the point of view of aspeaker from conversation content. In addition, assumingthat these are recognized, a reader can easily understand thecontext and the state of an utterance. Thus, avatar-likeagents have an effect of increasing the awareness of a “placeor location” in a community.

4. Designs and Implementation of TelMeA

TelMeA, an asynchronous web community systemusing avatar-like agents, has been designed and imple-mented. TelMeA provides a rich “location or field” forcommunications over networks using avatar-like agentsand various body representation capabilities.

4.1. Communications in TelMeA

4.1.1. User registration

User registration is needed for using TelMeA. A userregisters his ID and password as well as his own avatar-likeagent. To register an avatar-like agent, two methods areavailable: the method of registering online using a characterfile of an MS agent created by the user himself, and themethod of selecting among the agents using the system. Ineither method, an agent that is the same as that of anotheruser cannot be designated.

4.1.2. Registration and participation ofcommunity

Any user registering in TelMeA can construct a newcommunity within TelMeA. A community is constructedthrough an input on the home page using this communityname and background. A creator is registered in the com-munity as the manager of the community.

Any registered member of TelMeA can freely partici-pate and speak in the registered community. A user speaksthrough a single avatar-like agent to a multiple number ofcommunities. Thus, a user can participate in multiple com-munities while retaining subjective consistency through theavatar-like agent.

4.1.3. Reading of conversation

A community page displayed by a web browser onthe client side is constructed from a main page constructedfrom three frames, the avatar-like agent of each participant

Fig. 2. Model of the process of recognizing and naminga person (Young and colleagues [6]).

62

shown on this page, and a web page shown in the middle ofan utterance, as shown in Fig. 3. The main page is con-structed from a control panel performing control of a con-versation or an avatar-like agent, a topic list displaying alist of topics of conversations carried out up to now, and ashared web page displaying the home page of the commu-nity.

Once the page of the community is entered, theavatar-like agent of the user first appears on the main page.If conversations have already occurred in the community,the titles of the conversations as well as the list of speakersparticipating in the conversations are displayed. If a title isselected from the topic list in accordance with the guidanceof the user’s agent, the avatar-like agents of all participantsin the conversation appear.

First, the agent of the initial speaker makes an utter-ance exchanging gestures or expressions. The agent movesto the location of a certain picture on the shared web pageand performs an explanation while pointing to this picturewith a finger. When his utterance ends, the agents of otherparticipants show certain reactions such as nodding orlaughing.

If the “Next” icon on the control panel is clicked, theutterance of the next speaker begins. The agent of thisspeaker moves onto the screen across the agent of theprevious speaker, speaks in the direction of this agent, andspeaks while pointing to another picture. In addition, a webpage separate from the community home page is displayedand an utterance is made while a picture on this page isbeing pointed at.

When the “Replay” icon on the control panel isclicked, the utterance can be viewed from the beginning. Inaddition, when another title is clicked from the topic list,this conversation can be replayed.

A new utterance is created by using a page for editing(Fig. 5) displayed by clicking the “Edit” icon on the controlpanel. Since TelMeA is a system for asynchronous commu-nications, an utterance can be edited while searching for aweb page to be referenced without considerations of time,like speeches on a BBS or electronic mail. When an utter-

ance is completed, it is registered on the server side bypressing the “Submit” button of the page for editing so thatother speakers can reference this utterance later.

4.2. Configuration of TelMeA

4.2.1. System configuration

TelMeA has been developed as a web-based applica-tion. Specifically, a system configuration of the client servertype shown in Fig. 4 has been designed and adopted on thebasis of communications with the community page of Tel-MeA shown on the client Web browser and the server sidecommunity interface module.

The server side system is implemented using Java2*

and JSP† (JavaServer Pages) of Sun Microsystems Com-pany. In addition, the community page of TelMeA usingJavaScript is displayed on the client side and control of theavatar-like agents, communications with the server, andcontrol of the web browser etc. are performed. As avatar-like agents of this system, MS Agents‡ of Microsoft, con-trollable by the codes of JavaScript on a Web browser [7],are used.

4.2.2. ALAScript and ALAS-Interpreter

Conversations occurring in a community are de-scribed and edited in the format of ALAScript, which is ascript language unique for the system. Thus, the script ofan utterance described by ALAScript is translated into afunction with respect to the MS Agents or client browsersby a module called ALA-Interpreter, written in JavaScript,which is located in the community page and executed in theform of an utterance. The manuscripting and reading of theutterance are performed by transmission reception betweenclient servers described by ALAScript, which is in textualformat. In this way, a user can carry out multimodal con-versations by incorporating data of a lower level than in realimage communications.

4.2.3. Roles of Web Relay Server

In TelMeA, an utterance can be delivered by display-ing an arbitrary web page during the utterance, moving theagent to a desired image location, and pointing to this. Afacility to detect the location of an image on the screen ona web page displayed over TelMeA is incorporated for suchrepresentations.

*http://java.sun.com/j2se/†http://java.sun.com/products/jsp/‡http://www.microsoft.com/msagent/In using an MS Agent, a user must install each component of the MS Agenton the client and the MS Agent character file of participants. In addition,a browser (later than Internet Explorer 4.x for the Windows platform)corresponding to the MS Agent is also needed. Fig. 3. Configuration of a community web page.

63

All web pages referred to over TelMeA are displayedto the client via the Web Relay Server of TelMeA. The WebRelay Server analyzes the designated web page, writesJavaScript codes for calculating the location of an imagewithin the page on a source not visible to the user, andtransmits the web page to the client who has requested it.

4.2.4. Data structure of conversation

An utterance in ALAScript contributed to a commu-nity is registered in units of “utterances” or “reactions” aswell as speaker data. “Reactions” are brief responses to acertain utterance, not detailed comments, and assumemodes of participation in conversations such as nodding thehead or laughing.

An “utterance” of a certain participant and a “reac-tion” of another participant to this utterance form a datastructure of a “situation or scene.” A unit data called aconversation is formed by a time series sequence of thesesituations or scenes.

Replay of a conversation on a client is performed inunits of situations. If a reader selects a topic from the topiclist (Fig. 4) on a community page, the ALAScript of utter-ances and reactions contained in the initial situation of thistopic is downloaded to be executed by the respective ava-tar-like agents. If the “Next” icon on the control panel ispressed (or if the next speaker is selected from the topiclist), the utterances and reactions to the next situation aredownloaded and executed. A conversation is replayed bysuch repetitions.

4.3. Editing and contribution of utterances

As mentioned in the preceding section, an utteranceof a participant is represented in a format called ALAScript.

This representation format and the method of constructionof such representations are discussed below.

4.3.1. Classification of representation modesof avatar-like agents

A community participant writes a script for his ownavatar-like agent displayed on the community web page,and carries out a conversation with other participants bycontributing and replaying this script. The authors classifythe modes of representation of avatar-like agents into thefollowing four categories.

(1) Language representation

A user can perform language representation by anatural method of making an utterance via an avatar-likeagent, using voice accompanied by nonverbal (breath)sounds.

<Example>

Utterance: Make an utterance using voice accompa-nied by nonverbal sounds

Opinion: Indicate intention only by nonverbal sounds

(2) Body representation

Nonlanguage representations using the body by ani-mation representations of an avatar-like agent, such asgestures, expressions, movements, etc. Emotions or ges-tures can be represented.

<Example>

Expression: Representation of expression by animationGesture: Representation of gesture by animationMovement: Movement changes as gestures

(3) Representations considering interpersonal space

Representations considering interpersonal distancesbetween participants. Both a psychological distance and aphysical distance can be represented.

<Example>

Approaching: Behavior of approaching another agentrepresenting “starting a conversation” or “interest”

Going away: Behavior of going away from anotheragent representing “refusal,” or “objective observation”,etc.

(4) Representations based on common attention

Acts for sharing a context by inciting the attention ofanother party by clearly showing an object referenced to theother party. An utterance can be made by clearly indicatingindependent primary data as the content of the utterance andby clearly indicating the line of sight or finger-pointing.

Fig. 4. The architecture of TelMeA.

64

<Example>

Indication: Act of making an utterance by displayingprimary information on a document, figure, image, etc.

Line of sight: Act of clearly indicating an object bythe line of sight

Finger-pointing: Act of clearly indicating an objectby finger-pointing

A user can produce a representation of his intentionby combining these representations. Thus, a user can rep-resent in a natural form expressions that cannot be ex-pressed by the text discussed in Section 2.1, such asrepresentations using and sharing a “place,” or “nonlan-guage representations using the body” in addition to “rep-resentat ions of subject ivity” by simulation ofrepresentations by an avatar-like agent.

4.3.2. Editing of ALAScript

Figure 5 shows a page for editing ALAScript. If theEdit icon on the control panel is clicked, this editing pageis displayed as a separate window. Editing of ALAScriptcan be done in three modes: the topic providing mode, theutterance mode, and the reaction mode, and these modescan be switched on the screen.

The procedure of creating ALAScript by this windowis as follows.

• Writing of language representation

A sentence is constructed similarly as a usual textualdocument in the script editing area in the lower part of thescreen. When the Preview button is pressed, <#speak> tagis attached to an input sentence in the case of the topicproviding mode and the utterance mode and an utterance ismade by breath sounds by the agent. The <#think> tag isattached to the sentence and the sentence is represented bybreath sounds without voice.

• Addition of body representations

The pull-down menu in the middle of the screendisplays a list of animations representable by the avatar-likeagent of the user. The behavior of the avatar-like agent isreplayed animation, and the names of animations are in-serted by attaching the <#play> tag in the ALAScript duringediting if desired.

• Addition of interpersonal space representations

Currently, only the act of “approaching” is imple-mented. By either selecting the name of the other party fromthe list on the pull-down menu or directly clicking theavatar-like agent of this other party, the user’s agent movesclose to the agent of the other party and faces in the directionof the agent of the other party. The name of the other partywith the <#approach> tag attached is inserted into ALA-Script during editing.

• Common attention representation for web con-tents

If a URL is input to the inscription area in the editpage, this page is analyzed and displayed through the WebRelay Server (Section 4.2.3). The URL is inserted by at-taching the <#open> tag to ALAScript during editing.

In addition, if an arbitrary image is clicked on adisplayed page, the avatar-like agent of the user points tothe image by moving to the side of this image. The ID ofthis image attached by TelMeA is inserted by attaching the<#refer> tag to ALAScript.

In addition, an image file kept locally by a user canbe accessed by a server from the editing page and refer-enced by the avatar-like agent of the user similarly.

A user constructs an ALAScript by combining theabove representations. If the Preview button of the lowestpart of the screen is pressed, the avatar-like agent can bemoved around through the script during editing. A user canrefine the script by verifying his intended representations.When editing is completed, the script contents are transmit-ted to the server and registered by pressing the Submitbutton on the editing page.

4.3.3. Utterance design by ALAScript

The following two methods are considered as meth-ods for constructing ALAScript based on the authors’ ownexperience:

• adding nonlanguage representations to text• adding comments to web contents or previous

utterancesFig. 5. An ALAScript editing page.

65

The former is a sentence-based method that con-structs textual sentences based on past experiences and addsnonlanguage representations at appropriate places. Editingcan be done easily if representations such as appropriateexpressions or gestures etc. can be simply added. The latteris a content-based method that handles “citations” of pre-vious utterances in a BBS or electronic mail. However, itdiffers in that the object to which a comment is added is notcontent but a speaker or an image on the newly displayedweb page.

An utterance in TelMeA is considered to be compara-tively easy to design by combining the above two methods.The behavior of the avatar-like agent for this ALAScript isshown in Fig. 6 as an example of ALAScript constructed inthis way.

5. Test-Run of TelMeA and Results

The authors published TelMeA experimentally in auniversity environment to be used freely in that environ-ment. Seven sentences were constructed by the testers andusers during the testing period of 9 days, and 18 utterancesof these were contributed by a total of seven users.* After

the end of the testing period, all seven users were asked torespond to a questionnaire.

5.1. Assembling utterances and results

The contents of a total of 18 utterances were arrangedby kinds of representations. The language representationsdescribed by the <#speak> tag or the <#think> tag werearranged by the total number of phrases within a sentenceand the nonlanguage representations were arranged by thenumber of tags corresponding to the kinds of the repre-sentations. The results are shown in Table 2.

5.2. Questionnaire results

The responses to the questionnaire were given in fivecategories, 1 representing the most negative response and 5representing the most positive response. The contents of thequestionnaire were as follows:

• Recognition with respect to avatar-like agent (4questions)

• Evaluation of usability of avatar-like agent (10questions)

• Evaluation with respect to individual repre-sentations in TelMeA (8 questions)

• Relative evaluations of TelMeA with respect toother community systems (7 questions × 3 sys-tems)

The results of arranging the questionnaire responsesshow positive evaluations to all questions regarding indi-vidual representations in TelMeA and the usability of ava-tar-like agents compared with other community systems.

5.2.1. Recognition with respect to avatar-likeagent

It has been found from the questionnaire results thatrecognition of an avatar-like agent by a user is divided into

*It is coincidental that the number of users and the number of communitiesare the same. One user can create multiple communities and participate inmultiple communities. The number of communities is arbitrary.

Fig. 6. An example of ALAScript.

Table 2. Classification of utterances by representationtype

Language representations (number ofphrases)

352 (83.0%)

Body representations 57 (13.4%)

Interpersonal distance representations 10 (2.4%)

Common attention representations withrespect to web contents

5 (1.2%)

Nonlanguage representations (total) 72 (17.0%)

66

recognition as an “incarnation” of the user and recognitionas a “representative.” However, recognition is almost con-sistent with respect to the agent of the self and the agent ofanother person. In other words, many of the users regardingtheir agents as incarnations of themselves recognized theagents of other users as incarnations of others and many ofthe users regarding their agents as representatives of them-selves recognized the agents of others as their repre-sentatives.

Differences in recognition of an avatar-like agent,however, did not influence evaluations of avatar-like agentsor TelMeA. In addition, the presence of the agents of otherpeople did not have a conscious effect during script con-struction, regardless of the differences in recognition ofthese agents.

5.2.2. Usability of avatar-like agent

A “place or field” formed by an individual avatar-likeagent and multiple avatar-like agents displayed is evaluatedfrom the points of view of naturalness and usefulness. Theresults showed positive evaluations greater than 3 in allsubjects, except that the mean of the evaluation scores forthe naturalness of expressions or body movements of agentsof other people was 3.0.

A detailed analysis of the evaluation scores of indi-vidual users shows that usefulness is evaluated higher thannaturalness with respect to the movements or expressionsof agents. In addition, naturalness was evaluated higherthan usefulness by five of seven users with respect to theformation of the “place or field” by agents. Thus, thefunction of individual avatar-like agents was evaluated asmore useful than natural and the formation of a “place” bymultiple avatar-like agents was evaluated as more naturalthan useful.

5.2.3. Evaluations with respect to individualexpressions

Individual functions of TelMeA were evaluated high,the mean of all evaluation scores being greater than 4. Theusefulness of common attention representations (evaluationscore mean 4.76) with respect to web content was evaluatedas high even though the number of times of their use wassmall. The usefulness was then evaluated in descendingorder for interpersonal space representations (evaluationscore mean 4.43), language representations (4.35), bodyrepresentations (4.29), and representations using a place orfield by avatar-like agents, not available in an ML or BBS,were evaluated highly.

5.2.4. Comparative evaluations with othercommunity systems

When the proposed system was compared with anML by electronic mail, a BBS over the web, and a synchro-nous community IRC, the subjects tested gave mean evalu-

ation scores higher than 3 to all questions. The contents ofthe questions and the mean evaluation scores to thesequestions are shown in Table 3.

In particular, evaluations were high for “utterance ispleasant” and “seeing the utterance is pleasant,” indicatingthat avatar-like agents in TelMeA have an effect of increas-ing the pleasantness of communications. In addition, ascompared with other systems, many responded that “repre-sentations are easy,” indicating that representations ex-changing nonlanguage representations by avatar-likeagents are easy compared with representations based ontexts only.

5.3. Discussion of results

It can be understood from the questionnaire resultsthat users recognized the usefulness of TelMeA as a com-munity system. In particular, they gave high evaluations torepresentations using the context of the place or the locationof conversations provided by avatar-like agents, such ascommon attention representations or interpersonal spacerepresentations, which are difficult to represent by text. Inaddition, they evaluated bodily representations as highly aslanguage representations. This indicates that the impor-tance and necessity of nonlanguage representations in com-munications are widely recognized.

This fact is confirmed by the results of arranging theresults on representation modes within an utterance. Theutterances test-run in this study contained 17% nonlan-guage representations.* This number, while it is small com-pared with the number in real communications,† is

Table 3. Results of comparative evaluation with othercommunity systems

ML BBS IRC

Easy to make an utterance 3.43 3.86 3.17

Easy to make a representation 3.71 3.71 3.83Easy to understand the utterance

of the other party3.57 3.14 4.17

Rich information received froman utterance

3.29 3.29 3.50

Utterance is pleasant 4.29 4.00 3.83Seeing an utterance is pleasant 4.57 4.14 3.83Have a feeling of participating in the conversation

3.57 3.57 3.50

*This is a value computed on the basis of the assumption that eachnonlanguage representation in TelMeA corresponds to a phrase in lan-guage representations.†The ratio of nonlanguage information in face-to-face communications isknown to be about 70% [8, 9].

67

considered to indicate that the function of nonlanguagerepresentations using avatar-like agents of TelMeA is effec-tive, since they allow intentional representations and com-munications in a more natural form in asynchronouscommunications.

Tasks remaining with respect to TelMeA that can beconsidered from the questionnaire results include makingthe movements of individual agents more natural, and in-creasing the functionality of the place or location by theseagents.

6. Discussion

6.1. Summary and search of past utterances

The functions of summarizing and searching pastutterances are not implemented in the current TelMeA.However, these functions are recognized as critically essen-tial functions in presenting a summary of conversationcontents in a community as well as in reutilizing the knowl-edge and experience contained in past utterances. Thus, theformalized structure using tags of ALAScript may be con-sidered to be effective in summarizing and searching.

In TelMeA, an utterance can be made while laughingor referring to an image. Thus, the system is considered tobe capable of analyzing by inferring whether the utterancemade while laughing gives information on the favorablefeeling of the maker of the utterance, whether an expressionsuch as “this picture” contained in the utterance is an imageof the URL shown immediately before, etc., by analyzingthe utterance in ALAScript. In other words, from the inter-face of TelMeA, a description format that is analyticallystructurized from a representation made naturally by aperson is expected to be made automatically.

Employing this structure, it is considered that variouspast utterances can be used. For example, since citationscan be implemented as common attention representationswith respect to past utterances, a method having a capabilityof representation by citing past text can be implemented. Inaddition, autonomous agents that can automatically searchpast utterances using emotional representations as keys andintroduce past utterances related to the present conversationcontent can be incorporated.

6.2. Limitations of the effectiveness of TelMeA

As discussed in Section 3, simulating the appearanceor behavior of an avatar-like agent is recognized as repre-senting the subjectivity of an individual user, which makesdifferentiating this individual user easy. Currently, how-ever, this subjectivity is very limited compared with thesubjectivity of the real world. For example, a user selects

an agent, which can also be changed technically. Thus, theuniqueness or consistency of the subjectivity of the realworld is not necessarily maintained. However, since a goalof this study is to realize the awareness of the subjectivity,the subjectivity of the real world need not be realizednecessarily.

In increasing awareness of the information about theindividuality of a user naturally expressed in an utterance,it is necessary to increase the variety of expressions repre-sentable by an avatar-like agent, especially the bodily rep-resentations. For this, a method allowing a user to have amechanism for easily creating and adding animations to hisown avatar-like agent is considered useful. The variety ofrepresentations must be increased to an extent that does notinterfere with content analyses by clearly indicating sen-tence structure by tags.

7. Conclusions

In this paper, an avatar-like agent is proposed as anasynchronous communication conversation interface, andits effects and a system implementing it are discussed.

As a result of conducting psychological tests, theauthors have discovered that an avatar-like agent has theeffect of supporting differentiation of participants in a com-munity. They regard that a reader of a conversation caneasily understand a human environment within a commu-nity by using avatar-like agents and can easily recognize theintention or state of an utterance by using this recognitionas assumed knowledge about the individual utterances.

In addition, communications allowing the use of non-language contexts of a place such as bodily representationsor the calling of common attention by an avatar-like agent,meanings of interpersonal distances between avatar-likeagents, etc., are made possible by using avatar-like agents.As a result of test-running TelMeA, a system incorporatingavatar-like agents, representations showing the charac-teristics of an avatar-like agent are accepted positively andregarded as useful.

Classifying and systematizing the modes of commu-nications enabled by avatar-like agents remain as futuretasks. There are two aspects to this systematization ofmodes of communications. One is the aspect of systematiz-ing the real interpersonal communications mode and apply-ing it to a system to produce a more natural and richerenvironment for communications. The other is systematiz-ing communications modes within the language structureof the script language that describes the behavior of avatar-like agents. The goal of the former is to increase the numberof significant utterances, while the goal of the latter is toincrease the reutilizability of knowledge and experiencecontained in the utterances. The information editing andflow function of a web community in this information

68

society is expected to be increased further by systematizingthese two communications modes and implementing andapplying the results to a community system.

Acknowledgments. The authors express theirgratitude to technical member Y. Takeuchi and former di-rector Y. Katagiri of the Fourth Laboratory, ATR MediaIntegration and Communications Research Laboratories,for very useful suggestions on the psychological tests per-formed herein. This study was funded in part by a Ministryof Education and Science Research Funds Basic Researchgrant (B) (2), Task Number 11480078.

REFERENCES

1. Nishida T. Interactive media supporting creation ofknowledge of a community. 40th Anniversary Spe-cial Issue, Inf Process 2000;41:542–546.

2. NIFTY Network Community Research Society.Kaneko I, Matsuoka M, Nakamura Y, Okada T, et al.Appearance of Denen Kokyo Shugi network commu-

nities. Engineering Research Laboratory (Editor).NTT Publishers; 1997.

3. Harrison TM, Stephen T. Researching and creatingcommunity networks. In: Jones S (editor). DoingInternet research. California; 1999. p 221–241.

4. Damer B. Avatars! Judson J (editor). Peachpit Press;1998.

5. Tekeuchi Y, Katagiri Y. Social character design foranimated agents. Proc RO-MAN99, p 53–58.

6. Young AW, Hay DC, Ellis AW. The face that launcheda thousand slips: Everyday difficulties and errors inrecognizing people. Br J Psychol 1985;76:495–523.

7. Ball G, Ling D, Kurlander D, Miller J, Pugh D, SkellyT, Stankosky A, Thiel D, Van Dantzich M, Wax T.Lifelike computer characters: The Persona project atMicrosoft. In: Bradshaw JM (editor). Softwareagents. AAAI Press; 1997. p 191–222.

8. Birdwhistell RL. Kinesics and context: Essays onbody motion. University of Pennsylvania Press;1970.

9. Katz AM, Katz VT (editors). Foundations of nonver-bal communication: Readings, exercises, and com-mentary. Southern Illinois University Press; 1983.

AUTHORS (from left to right)

Toru Takahashi received his B.S. degree from the Department of Physics, Kanagaku University, and did his doctoralstudies in the Graduate School of Information Science, Nara Institute of Science and Technology, until 1999. He is currently adoctoral candidate. He has worked for ATR Media Integration and Communications Research Laboratories since 1999. Hisresearch interests include interface agents, sharing and reutilizing of knowledge, community ware, and HCI. He is a memberof the Artificial Intelligence Society and Information Processing Society.

Hideaki Takeda (member) received his B.S., M.S., and Ph.D. degrees in precision mechanics from the Faculty ofEngineering, University of Tokyo, in 1986, 1988, and 1991. He joined the Japanese Systems Development LaboratoryFoundation in 1991. He was a postdoctoral fellow at Norway Institute of Technology in 1992. After being appointed a researchassociate and assistant professor at Nara Institute of Science and Technology in 1993 and 1995, he has been an associate professorat the National Institute of Informatics since 2000. His research areas include sharing and reuse of knowledge, designengineering, and intelligent CAD. He received a 1995 Distinguished Paper Award from the Artificial Intelligence Society. Heis a member of the Artificial Intelligence Society and AAAI.

69