AutoTutor: A tutor with dialogue in natural languagekvanlehn/ITScourse2009/Readings... · 2009-04-22 · edge-based reasoning when solving particular problems. Why/Atlas (VanLehn,

Copyright 2004 Psychonomic Society Inc 180

Behavior Research Methods Instruments amp Computers2004 36 (2) 180-192

The vision of having a computer communicate withhumans in natural language was entertained shortly afterthe computer was invented Weizenbaumrsquos (1966) ELIZAprogram was the first conversation system that was rea-sonably successful popular and widely used ELIZAsimulated a Rogerian client-centered psychotherapist asillustrated in the conversation below

Patient You are like my father in some waysELIZA How am I like your fatherPatient You are very aggressive but I think you donrsquot wantme to notice thatELIZA What makes you think I am aggressivePatient You always argue with meELIZA Why do you think I argue with you

Like a typical Rogerian therapist ELIZA tried to get thepatient to do the talking by asking the patient questionsabout the patientrsquos verbal contributions to the exchangeConversation partners are expected to answer questions

so a therapistrsquos question readily turns the spotlight ontothe patient ELIZA turned the patientrsquos assertions into atherapistrsquos questions by simple syntactic transforma-tional rules That is ELIZA detected keywords and wordcombinations that triggered rules which in turn gener-ated ELIZArsquos responses The only intelligence in ELIZAwas the stimulusndashresponse knowledge captured in pro-duction rules that operate on keywords and that performsyntactic transformations What was so remarkable aboutELIZA is that 100ndash200 simple production rules couldvery often create an illusion of comprehension eventhough ELIZA had no depth Computer scientists enter-tained themselves by adding a progressively larger set ofrules to handle all sorts of contingencies One couldimagine that an ELIZA with 20000 well-selected rulesmight very well simulate a responsive intelligent com-passionate therapist However no one ever tried

Efforts to build conversational systems continued in the1970s and early 1980s PARRY attempted to simulate aparanoid agent (Colby Weber amp Hilf 1971) SCHOLARtutored students on South American geography by askingand answering questions (Collins Warnock amp Passafiume1975) Moonrocks (Woods 1977) and Elinor (Normanamp Rumelhart 1975) syntactically parsed questions andanswered usersrsquo queries Schank and his colleagues builtcomputer models of natural language understanding andrudimentary dialogue about scripted activities (Lehnertamp Ringle 1982 Schank amp Riesbeck 1982) SHRDLUmanipulated simple objects in a blocks world in responseto a userrsquos command (Winograd 1972)

Unfortunately by the mid-1980s most researchers incognitive science and artificial intelligence were con-vinced that the prospect of building a good conversationsystem was well beyond the horizon The chief chal-

The Tutoring Research Group is an interdisciplinary research teamcomprised of approximately 35 researchers with backgrounds in psychol-ogy computer science physics and education (for more informationvisit httpwwwautotutororg) The research on AutoTutor was sup-ported by National Science Foundation (NSF) Grants SBR 9720314REC 0106965 REC 0126265 and ITR 0325428 and by Grant N00014-00-1-0600 of the Department of Defense Multidisciplinary UniversityResearch Initiative administered by the Office of Naval Research(ONR) Any opinions findings and conclusions or recommendationsexpressed in this article are those of the authors and do not necessarilyreflect the views of the Department of Defense the ONR or the NSFKurt VanLehn Carolyn Roseacute Pam Jordan and others at the Universityof Pittsburgh collaborated with us in preparing AutoTutor materials onconceptual physics Correspondence concerning this article should beaddressed to A Graesser Department of Psychology 202 PsychologyBuilding University of Memphis Memphis TN 38152-3230 (e-maila-graessermemphisedu)

AutoTutor A tutor with dialogue in natural language

ARTHUR C GRAESSER SHULAN LU GEORGE TANNER JACKSON HEATHER HITE MITCHELLMATHEW VENTURA ANDREW OLNEY and MAX M LOUWERSE

University of Memphis Memphis Tennessee

AutoTutor is a learning environment that tutors students by holding a conversation in natural lan-guage AutoTutor has been developed for Newtonian qualitative physics and computer literacy Its de-sign was inspired by explanation-based constructivist theories of learning intelligent tutoring systemsthat adaptively respond to student knowledge and empirical research on dialogue patterns in tutorialdiscourse AutoTutor presents challenging problems (formulated as questions) from a curriculumscript and then engages in mixed initiative dialogue that guides the student in building an answer Itprovides the student with positive neutral or negative feedback on the studentrsquos typed responsespumps the student for more information prompts the student to fill in missing words gives hints fillsin missing information with assertions identifies and corrects erroneous ideas answers the studentrsquosquestions and summarizes answers AutoTutor has produced learning gains of approximately 70 sigmafor deep levels of comprehension

AUTOTUTOR 181

lenges were (1) the inherent complexities of natural lan-guage processing (2) the unconstrained open-ended na-ture of world knowledge and (3) the lack of research onlengthy threads of connected discourse In retrospectthis extreme pessimism about discourse and natural lan-guage technologies was arguably premature A sufficientnumber of technical advances has been made in the lastdecade for researchers to revisit the vision of buildingdialogue systems

The primary technical breakthroughs came from thefields of computational linguistics information retrievalcognitive science artificial intelligence and discourseprocesses For example the field of computational lin-guistics has produced an impressive array of lexiconssyntactic parsers semantic interpretation modules anddialogue analyzers that are capable of rapidly extractinginformation from naturalistic text for information retrievalmachine translation and speech recognition (Allen 1995DARPA 1995 Harabagiu Maiorano amp Pasca 2002Jurafsky amp Martin 2000 Manning amp Schuumltze 1999Voorhees 2001) These advancements in computationallinguistics represent world knowledge symbolically sta-tistically or through a hybrid of these two foundationsFor instance Lenatrsquos (1995) CYC system represents alarge volume of mundane world knowledge in symbolicforms that can be integrated with a diverse set of pro-cessing architectures The world knowledge contained inan encyclopedia can be represented statistically in high-dimensional spaces such as latent semantic analysis(LSA Foltz Gilliam amp Kendall 2000 Landauer Foltzamp Laham 1998) and Hyperspace Analogue to Language(HAL Burgess Livesay amp Lund 1998) An LSA spaceprovides the backbone for statistical metrics on whethertwo text excerpts are conceptually similar the reliabilityof these similarity metrics has been found to be equiva-lent to that of human judgments The representation andprocessing of connected discourse is much less mysteri-ous after two decades of research in discourse process-ing (Graesser Gernsbacher amp Goldman 2003) and rel-evant interdisciplinary research (Louwerse amp van Peer2002) There are now generic computational modules forbuilding dialogue facilities that track and manage the be-liefs knowledge intentions goals and attention statesof agents in two-party dialogues (Graesser VanLehnRoseacute Jordan amp Harter 2001 Gratch et al 2002 Mooreamp Wiemer-Hastings 2003 Rich amp Sidner 1998 RickelLesh Rich Sidner amp Gertner 2002)

WHEN ARE NATURAL LANGUAGEDIALOGUE FACILITIES FEASIBLE

Natural language dialogue (NLD) facilities are ex-pected to do a reasonable job in some conversational con-texts but not in others Success depends on the subjectmatter the knowledge of the learner the expected depthof comprehension and the expected sophistication of thedialogue strategies We do not believe that current NLDfacilities will be impressive when the subject matter re-

quires mathematical or analytical precision when theknowledge level of the learner is high and when the userwould like to converse with a humorous witty or illu-minating partner For example an NLD facility wouldnot be well suited to an eCommerce application that man-ages precise budgets that a user carefully tracks AnNLD facility would not be good for an application inwhich the dialogue system must simulate a good spouseparent comedian or confidant An NLD facility is morefeasible in applications that involve imprecise verbalcontent a low to medium level of user knowledge abouta topic and earnest literal replies

We are convinced that tutoring environments are feasi-ble NLD applications when the subject matter is verbal andqualitative NLD tutors have been attempted for mathe-matics with limited success (Heffernan amp Koedinger1998) whereas those in qualitative domains have shownmore promise (Graesser VanLehn et al 2001 Rickelet al 2002 VanLehn Jordan et al 2002) TutorialNLD is feasible when the shared knowledge (ie com-mon ground) between the tutor and the learner is low ormoderate rather than high If the common ground ishigh then both dialogue participants (ie the computertutor and the learner) will be expecting a more precisedegree of mutual understanding and therefore will runa higher risk of failing to meet the otherrsquos expectations

It is noteworthy that human tutors are not able to mon-itor the knowledge of students at a fine-grained level be-cause much of what students express is vague under-specified ambiguous fragmentary and error ridden (Fox1993 Graesser amp Person 1994 Graesser Person ampMagliano 1995 Shah Evens Michael amp Rovick 2002)There are potential costs if a tutor attempts to do so Forexample it is often more worthwhile for the tutor to helpbuild new correct knowledge than to become boggeddown in dissecting and correcting each of the learnerrsquosknowledge deficits Tutors do have an approximate senseof what a student knows and this appears to be sufficientto provide productive dialogue moves that lead to signif-icant learning gains in the student (Chi Siler Jeong Ya-mauchi amp Hausmann 2001 Cohen Kulik amp Kulik1982 Graesser et al 1995) These considerations moti-vated the design of AutoTutor (Graesser Person Harteramp the Tutoring Research Group (TRG) 2001 GraesserVanLehn et al 2001 Graesser K Wiemer-HastingsP Wiemer-Hastings Kreuz amp the TRG 1999) whichwill be described in the next section The central as-sumption in a nutshell is that dialogue can be usefulwhen it advances the dialogue and learning agenda evenwhen the tutor does not fully understand a student

Tutorial NLD appears to be a more feasible technologyto the extent that the tutoring strategies unlike strategiesthat are highly sophisticated follow what most humantutors do Most human tutors anticipate particular correctanswers (called expectations) and particular misunder-standings (misconceptions) when they ask the learnerquestions and trace the learnerrsquos reasoning As the learnerarticulates the answer or solves the problem this content

182 GRAESSER ET AL

is constantly being compared with the expectations andanticipated misconceptions The tutor responds adap-tively and appropriately when particular expectations ormisconceptions are expressed This tutoring mechanism isexpectation- and misconception-tailored (EMT) dialogue(Graesser Hu amp McNamara in press) The EMT dialoguemoves of most human tutors are not particularly sophis-ticated from the standpoint of ideal tutoring strategiesthat have been proposed in the fields of education andartificial intelligence (Graesser et al 1995) Graesserand colleagues (Graesser amp Person 1994 Graesser et al1995) videotaped over 100 h of naturalistic tutoring tran-scribed the data classified the speech act utterances intodiscourse categories and analyzed the rate of particulardiscourse patterns These analyses revealed that humantutors rarely implement intelligent pedagogical tech-niques such as bona fide Socratic tutoring strategiesmodelingndashscaffoldingndashfading reciprocal teaching fron-tier learning building on prerequisites cascade learningand diagnosisremediation of deep misconceptions(Collins Brown amp Newman 1989 Palincsar amp Brown1984 Sleeman amp Brown 1982) Instead tutors tend tocoach students in constructing explanations according tothe EMT dialogue patterns Fortunately the EMT dia-logue strategy is substantially easier to implement com-putationally than are sophisticated tutoring strategies

Researchers have developed approximately half a dozenintelligent tutoring systems with dialogue in natural lan-guage AutoTutor and WhyAutoTutor (Graesser et al inpress Graesser Person et al 2001 Graesser et al 1999)were developed for introductory computer literacy andNewtonian physics These systems help college studentsgenerate cognitive explanations and patterns of knowl-edge-based reasoning when solving particular problemsWhyAtlas (VanLehn Jordan et al 2002) also has stu-dents learn about conceptual physics with a coach thathelps build explanations of conceptual physics problemsCIRCSIM Tutor (Hume Michael Rovick amp Evens 1996Shah et al 2002) helps medical students learn about thecirculatory system by using strategies of an accomplishedtutor with a medical degree PACO (Rickel et al 2002)assists learners in interacting with mechanical equipmentand completing tasks by interacting in natural language

Two generalizations can be made from the tutorialNLD systems that have been created to date The firstgeneralization addresses dialogue management Finite-state machines for dialogue management (which will bedescribed later) have served as an architecture that canproduce working systems (such as AutoTutor WhyAutoTutor and WhyAtlas) However no full-fledged di-alogue planners have been included in working systemsthat perform well enough to be satisfactorily evaluated(as in CIRCSIM Tutor and PACO) The Mission RehearsalSystem (Gratch et al 2002) comes closest to being a full-fledged dialogue planner but the depth and sophistica-tion of such planning are extremely limited Dialogueplanning is very difficult because it requires the preciserecognition of knowledge states (goals intentions beliefs

knowledge) and a closed system of formal reasoningUnfortunately the dialogue contributions of most learnersare too vague and underspecified to afford precise recog-nition of knowledge states The second generalizationaddresses the representation of world knowledge AnLSA-based statistical representation of world knowledgeallows the researcher to have some world knowledge com-ponent up and running very quickly (measured in hoursor days) whereas a symbolic representation of worldknowledge takes years or decades to develop AutoTutor(and WhyAutoTutor) routinely incorporates LSA in itsknowledge representation so a new subject matter canbe quickly developed

AUTOTUTOR

AutoTutor is an NLD tutor developed by Graesser andcolleagues at the University of Memphis (Graesser Personet al 2001 Graesser VanLehn et al 2001 Graesseret al 1999 Song Hu Olney Graesser amp the TRG inpress) AutoTutor poses questions or problems that requireapproximately a paragraph of information to answer Anexample question in conceptual physics is ldquoSuppose a boyis in a free-falling elevator and he holds his keys motion-less right in front of his face and then lets go What willhappen to the keys Explain whyrdquo Another example ques-tion is ldquoWhen a car without headrests on the seats is struckfrom behind the passengers often suffer neck injuriesWhy do passengers get neck injuries in this situationrdquo Itis possible to accommodate questions with answers thatare longer or shorter the paragraph span is simply thelength of the answers that have been implemented inAutoTutor thus far in an attempt to handle open-endedquestions that invite answers based on qualitative reason-ing Although an ideal answer is approximately three toseven sentences in length the initial answers to these ques-tions by learners are typically only one word to two sen-tences in length This is where tutorial dialogue is partic-ularly helpful AutoTutor engages the learner in a dialoguethat assists the learner in the evolution of an improved an-swer that draws out more of the learnerrsquos knowledge thatis relevant to the answer The dialogue between AutoTutorand the learner typically lasts 30ndash100 turns (ie thelearner expresses something then the tutor does then thelearner and so on) There is a mixed-initiative dialogue tothe extent that each dialogue partner can ask questions andstart new topics of discussion

Figure 1 shows the interface of AutoTutor The majorquestion (involving in this example a boy droppingkeys in a falling elevator) is selected and presented in thetop right window This major question remains at the topof the Web page until it is finished being answered duringa multiturn dialogue between the student and AutoTutorThe student uses the bottom right window to type in his orher contribution for each turn and the content of both tutorand student turns is reflected in the bottom left windowThe animated conversational agent resides in the upperleft area The agent uses either an ATampT SpeechWorks

AUTOTUTOR 183

or Microsoft Agent speech engine (dependent on licensingagreements) to say the content of AutoTutorrsquos turns duringthe process of answering the presented question Figure 2shows a somewhat different interface that is used whentutoring computer literacy This interface has a displayarea for diagrams but no dialogue history window

The design of AutoTutor was inspired by three bodiesof research theoretical empirical and applied These in-clude explanation-based constructivist theories of learn-ing (Aleven amp Koedinger 2002 Chi de Leeuw Chiu ampLaVancher 1994 VanLehn Jones amp Chi 1992) intel-ligent tutoring systems that adaptively respond to studentknowledge (Anderson Corbett Koedinger amp Pelletier1995 VanLehn Lynch et al 2002) and empirical re-search that has documented the collaborative construc-tive activities that routinely occur during human tutoring(Chi et al 2001 Fox 1993 Graesser et al 1995 Moore1995 Shah et al 2002) According to the explanation-based constructivist theories of learning learning ismore effective and deeper when the learner must activelygenerate explanations justifications and functional pro-cedures than when he or she is merely given informationto read Regarding adaptive intelligent tutoring systemsthe tutors give immediate feedback on the learnerrsquos actionsand guide the learner on what to do next in a fashion thatis sensitive to what the system believes the learner knowsRegarding the empirical research on tutorial dialoguethe patterns of discourse uncovered in naturalistic tutoringare imported into the dialogue management facilities ofAutoTutor

Covering Expectations CorrectingMisconceptions and Answering Questions with Dialogue Moves

AutoTutor produces several categories of dialoguemoves that facilitate covering the information that is an-ticipated by AutoTutorrsquos curriculum script The curricu-lum script includes the questions problems expecta-tions misconceptions and most relevant subject mattercontent AutoTutor delivers its dialogue moves by an an-imated conversational agent (synthesized speech facialexpressions gestures) whereas learners enter their an-swers by keyboard AutoTutor provides positive neutraland negative feedback to the learner pumps the learnerfor more information (eg with the question ldquoWhatelserdquo) prompts the learner to fill in missing words giveshints fills in missing information with assertions iden-tifies and corrects bad answers answers learnersrsquo ques-tions and summarizes answers As the learner expressesinformation over many turns the information in the threeto seven sentences of an answer is eventually covered andthe question is answered During the process of supply-ing the ideal answer the learner periodically articulatesmisconceptions and false assertions If these miscon-ceptions have been anticipated and incorporated into theprogram AutoTutor provides the learner with informa-tion to correct the misconceptions Therefore as thelearner expresses information over the turns this infor-mation is compared with expectations and misconcep-tions and AutoTutor formulates its dialogue moves in afashion that is sensitive to the learnerrsquos input That is

Figure 1 A computer screen of AutoTutor for the subject matter of conceptual physics

184 GRAESSER ET AL

AutoTutor implements the EMT dialogue which isknown to be common in human tutoring

The conversation below illustrates some of the fea-tures of AutoTutorrsquos dialogue This was an actual conversa-tion with a student taken from one of the AutoTutor logfiles Discourse tags have been added in capitals whereasother descriptive information is added in italics

AutoTutor When a car without headrests on the seats isstruck from behind the passengers often suffer neck in-juries Why do passengers get neck injuries in this situa-tion (MAIN QUESTION)Student The passenger experiences a backward accelera-tion while the car moves forwardAutoTutor And can you add to that (PUMP)Student This is because the passenger experiences a forceopposite in direction to the force on the carAutoTutor Um yes correct (POSITIVE FEEDBACK)When the neck provides a large force what within theneck would get damaged (HINT)Student The spineAutoTutor Umm (NEUTRAL NEGATIVE FEEDBACK)I think you can get this Damage within the neck will occurin the ligaments and the [spoken with rising intona-tion and accompanied with a prompting hand gesture](PROMPT)Student MusclesAutoTutor Brilliant (POSITIVE FEEDBACK) Ligamentsand the muscles (PROMPT RESPONSE) Letrsquos keep going(DIALOGUE ADVANCER) Additionally when there isno headrest present in the car what accelerates the head ofthe passenger (HINT)Student The impact from behind

AutoTutor I am sure you can get this The head of the pas-senger will be accelerated by the passengerrsquos [spokenwith rising intonation and accompanied with a promptinghand gesture] (PROMPT)Student CarAutoTutor Wrong (NEGATIVE FEEDBACK) By thepassengerrsquos neck (PROMPT RESPONSE)[moving ahead 22 turns in the dialogue]AutoTutor Do you perhaps have any general conceptualphysics questions (QUESTION PROMPT)Student NoAutoTutor How would you answer my first question hav-ing gone over what wersquove just talked about (FINAL AN-SWER PROMPT)Student The passengers get neck injuries because hellip [81additional words]AutoTutor Great (POSITIVE FEEDBACK) To recapwhen a car is struck from behind the force of impact willcause the car to experience a large forward accelerationThe car seat is attached to the car and will also have thesame large acceleration When the car is hit from behindthe seat will exert the same acceleration to the torso of thebody In order for the head to go along with the torso itwill need to be accelerated at the same rate as the torsoSince the head has no acceleration being applied to it by aheadrest the neck gets stretched damaging its musclesand ligaments in an attempt to produce the required largeacceleration (SUMMARY)

In addition to asking questions AutoTutor attempts tohandle questions posed by the learner However some-what surprisingly students rarely ask questions in class-rooms human tutoring sessions and AutoTutor sessions

Packet B

Packet A

Pac

ket A

Packet A

Packet B

Packet B

Figure 2 A computer screen of AutoTutor for the subject matter of introductory computer literacy

AUTOTUTOR 185

(Graesser amp Olde 2003 Graesser amp Person 1994) Therate of learner questions is one question per 6ndash7 h in aclassroom environment and one question per 2 min in tu-toring Although it is pedagogically disappointing thatlearners ask so few questions the good news is that thisaspect of human tutor interaction makes it easier to builda dialogue-based intelligent tutoring system such asAutoTutor It is not computationally feasible to interpretany arbitrary learner input from scratch and to constructa mental space that adequately captures what the learnerhas in mind Instead the best that AutoTutor can do is tocompare learner input with expectations through pattern-matching operations Therefore what human tutors andlearners do is compatible with what currently can behandled computationally within AutoTutor

Latent Semantic AnalysisAutoTutor uses LSA for its conceptual pattern-matching

algorithm when evaluating whether student input matchesthe expectations and anticipated misconceptions LSA is ahigh-dimensional statistical technique that among otherthings measures the conceptual similarity of any twopieces of text such as words sentences paragraphs orlengthier documents (Foltz et al 2000 E Kintsch Stein-hart Stahl amp the LSA Research Group 2000 W Kintsch1998 Landauer amp Dumais 1997 Landauer et al 1998)A cosine is calculated between the LSA vector associatedwith expectation E (or misconception M ) and the vectorassociated with learner input I E (or M ) is scored as cov-ered if the match between E or M and the learnerrsquos textinput I meets some threshold which has varied between40 and 85 in previous instantiations of AutoTutor Asthe threshold parameter increases the learner needs tobe more precise in articulating information and therebycover the expectations

Suppose that there are four key expectations embed-ded within an ideal answer AutoTutor expects all four tobe covered in a complete answer and will direct the dia-logue in a fashion that finesses the students to articulatethese expectations (through prompts and hints) AutoTutorstays on topic by completing the subdialogue that coversE before starting a subdialogue on another expectationFor example suppose an answer requires the expecta-tion The force of impact will cause the car to experiencea large forward acceleration The following family ofprompts is available to encourage the student to articu-late particular content words in the expectation

1 The impact will cause the car to experience a forward_____2 The impact will cause the car to experience a large ac-celeration in what direction _____3 The impact will cause the car to experience a forwardacceleration with a magnitude that is very _____4 The car will experience a large forward accelerationafter the force of ______5 The car will experience a large forward accelerationfrom the impactrsquos ______6 What experiences a large forward acceleration ______

The particular prompts that are selected are those thatf ill in missing information if answered successfullyThus the dialogue management component adaptivelyselects hints and prompts in an attempt to achieve patterncompletion The expectation is covered when enough ofthe ideas underlying the content words in the expectationare expressed by the student so that the LSA threshold ismet or exceeded

AutoTutor considers everything the student expressesduring Conversation Turns 1ndashn to evaluate whether ex-pectation E is covered If the student has failed to articulateone of the six content words (force impact car largeforward acceleration) AutoTutor selects the correspond-ing prompts 5 4 6 3 2 and 1 respectively Thereforeif the student has made assertions X Y and Z at a par-ticular point in the dialogue then all possible combina-tions of X Y and Z would be considered in the matchesX Y Z XY XZ YZ and XYZ The degree of match foreach comparison between E and I is computed as cosine(vector E vector I ) The maximum cosine match scoreamong all seven combinations of sentences is one methodused to assess whether E is covered If the match meetsor exceeds threshold T then E is covered If the match isless than T then AutoTutor selects the prompt (or hint)that has the best chance of improving the match if thelearner provides the correct answer to the prompt Onlyexplicit statements by the learner (not by AutoTutor) areconsidered when determining whether expectations arecovered As such this approach is compatible with con-structivist learning theories that emphasize the importanceof the learnerrsquos generating the answer We are currentlyfine-tuning the LSA-based pattern matches betweenlearner input and AutoTutorrsquos expected input (Hu et al2003 Olde Franceschetti Karnavat Graesser amp theTRG 2002)

LSA does a moderately impressive job of determiningwhether the information in learner essays matches par-ticular expectations associated with an ideal answer Forexample we have instructed experts in physics or com-puter literacy to make judgments concerning whetherparticular expectations were covered within learner es-says on conceptual physics problems Using either strin-gent or lenient criteria these experts computed a cover-age score based on the proportion of expectations thatwere believed to be present in the learner essays LSA wasused to compute the proportion of expectations coveredusing varying thresholds of cosine values on whether in-formation in the learner essay matched each expectationCorrelations between the LSA scores and the judgesrsquocoverage scores have been found to be approximately 50for both conceptual physics (Olde et al 2002) and com-puter literacy (Graesser P Wiemer-Hastings et al 2000)Correlations generally increase as the length of the textincreases reaching as high as 73 in research conductedat other labs (see Foltz et al 2000) LSA metrics havealso done a reasonable job tracking the coverage of ex-pectations and the identification of misconceptions dur-

186 GRAESSER ET AL

ing the dynamic turn-by-turn dialogue of AutoTutor(Graesser et al 2000)

The conversation is finished for the main question orproblem when all expectations are covered In the mean-time if the student articulates information that matchesany misconception the misconception is corrected as asubdialogue and then the conversation returns to finishcoverage of the expectations Again the process of cov-ering all expectations and correcting misconceptionsthat arise normally requires a dialogue of 30ndash100 turns(ie 15ndash50 student turns)

A Sketch of AutoTutorrsquos ComputationalArchitecture

The computational architectures of AutoTutor havebeen discussed extensively in previous publications(Graesser Person et al 2001 Graesser VanLehn et al2001 Graesser K Wiemer-Hastings et al 1999 Songet al in press) so this article will provide only a briefsketch of the components The original AutoTutor waswritten in Java and resided on a Pentium-based serverplatform to be delivered over the Internet The most re-cent version has a more modular architecture that willnot be discussed in this article The software residing onthe server has a set of permanent databases that are notupdated throughout the course of tutoring These per-manent components include the five below

Curriculum script repository Each script containsthe content associated with a question or problem Foreach there is (1) the ideal answer (2) a set of expectations(3) families of potential hints correct hint responsesprompts correct prompt responses and assertions associ-ated with each expectation (4) a set of misconceptions andcorrections for each misconception (5) a set of key wordsand functional synonyms (6) a summary and (7) markuplanguage for the speech generator and gesture generatorfor components in (1) through (6) that require actions bythe animated agents Subject-matter experts can easilycreate the content of the curriculum script with an au-thoring tool called the AutoTutor Script Authoring Tool(ASAT Susarla et al 2003)

Computational linguistics modules There are lexi-cons syntactic parsers and other computational linguis-tics modules that are used to extract and classify infor-mation from learner input It is beyond the scope of thisarticle to describe these modules further

Corpus of documents This is a textbook articles onthe subject matter or other content that the question-answering facility accesses for paragraphs that answerquestions

Glossary There is a glossary of technical terms andtheir definitions Whenever a learner asks a definitionalquestion (ldquoWhat does X meanrdquo) the glossary is con-sulted and the definition is produced for the entry in theglossary

LSA space LSA vectors are stored for words cur-riculum script content and the documents in the corpus

In addition to the five static data modules enumeratedabove AutoTutor has a set of processing modules and dy-namic storage units that maintain qualitative content andquantitative parameters Storage registers are frequentlyupdated as the tutoring process proceeds For exampleAutoTutor keeps track of student ability (as evaluated byLSA from student assertions) student initiative (such asthe incidence of student questions) student verbosity(number of words per student turn) and the incrementalprogress in having a question answered as the dialoguehistory grows turn by turn The dialogue managementmodule of AutoTutor flexibly adapts to the student byvirtue of these parameters so it is extremely unlikely thattwo conversations with AutoTutor are ever the same

Dialogue Management The dialogue managementmodule is an augmented finite state transition network(for details see Allen 1995 Jurafsky amp Martin 2000)The nodes in the network refer to knowledge goal states(eg expectation E is under focus and AutoTutor wantsto get the student to articulate it) or dialogue states (egthe student just expressed an assertion as his or her firstturn in answering the question) The arcs refer to cate-gories of tutor dialogue moves (eg feedback pumpsprompts hints summaries) or discourse markers that linkdialogue moves (eg ldquookayrdquo ldquomoving onrdquo ldquofurthermorerdquoLouwerse amp Mitchell 2003) A particular arc is tra-versed when particular conditions are met For examplea pump arc is traversed when it is the studentrsquos first turnand the studentrsquos assertion has a low LSA match value

Arc traversal is normally contingent on outputs ofcomputational algorithms and procedures that are sensi-tive to the dynamic evolution of the dialogue These al-gorithms and procedures operate on the snapshot of pa-rameters curriculum content knowledge goal statesstudent knowledge dialogue states LSA measures andso on which reflect the current conversation constraintsand achievements For example there are algorithms thatselect dialogue move categories intended to get the stu-dent to fill in missing information in E (the expectationunder focus) There are several alternative algorithms forachieving this goal Consider one of the early algorithmsthat we adopted which relied on fuzzy production rulesIf the student had almost f inished articulating E butlacked a critical noun or verb then a prompt categorywould be selected because the function of prompts is toextract single words from students The particular promptselected from the curriculum script would be tailored toextract the particular missing word through another mod-ule that fills posted dialogue move categories with par-ticular content If the student is classified as having highability and has failed to articulate most of the words inE then a hint category might be selected Fuzzy produc-tion rules made these selections

An alternative algorithm to fleshing out E uses two cy-cles of hintndashpromptndashassertion That is AutoTutorrsquos se-lection of dialogue moves over successive turns followsa particular order first hint then prompt then assert

AUTOTUTOR 187

then hint then prompt then assert AutoTutor exits thetwo cycles as soon as the student articulates E to satis-faction (ie the LSA threshold is met)

Other processing modules in AutoTutor execute variousimportant functions linguistic information extractionspeech act classification question answering evaluationof student assertions selection of the next expectation tobe covered and speech production with the animatedconversational agent It is beyond the scope of this paperto describe all of these modules but two will be describedbriefly

Speech act classifier AutoTutor needs to determinethe intent or conversational function of a learnerrsquos con-tribution in order to respond to the student flexiblyAutoTutor obviously should respond very differentlywhen the learner makes an assertion than when the learnerasks a question A learner who asks ldquoCould you repeatthatrdquo would probably not like to have ldquoyesrdquo or ldquonordquo asa response but would like to have the previous utterancerepeated The classifier system has 20 categories Thesecategories include assertions metacommunicative ex-pressions (ldquoCould you repeat thatrdquo ldquoI canrsquot hear yourdquo)metacognitive expressions (ldquoI donrsquot knowrdquo ldquoIrsquom lostrdquo)short responses (ldquoOhrdquo ldquoYesrdquo) and the 16 question cate-gories identified by Graesser and Person (1994) Theclassifier uses a combination of syntactic templates andkey words Syntactic tagging is provided by the ApplePie parser (Sekine amp Grishman 1995) together with cas-caded finite state transducers (see Jurafsky amp Martin2000) The finite state transducers consist of a trans-ducer of key words (eg ldquodifferencerdquo and ldquocomparisonrdquoin the comparison question category) and syntactic tem-plates Extensive testing of the classifier showed that theaccuracy of the classifier ranged from 65 to 97 de-pending on the corpus and was indistinguishable fromthe reliability of human judges (Louwerse GraesserOlney amp the TRG 2002 Olney et al 2003)

Selecting the expectation to cover next After oneexpectation is finished being covered AutoTutor moveson to cover another expectation that has a subthresholdLSA score There are different pedagogical principlesthat guide this selection of which expectation to postnext on the goal stack One principle is called the zoneof proximal development or frontier learning AutoTutorselects the particular E that is the smallest incrementabove what the student has articulated That is it mod-estly extends what the student has already articulatedAlgorithmically this is simply the expectation with thehighest LSA coverage score among those expectationsthat have not yet been covered A second principle iscalled the coherence principle In an attempt to providea coherent thread of conversation AutoTutor selects the(uncovered) expectation that has the highest LSA simi-larity to the expectation that was most recently coveredA third pedagogical principle is to select a central piv-otal expectation that has the highest likelihood of pullingin the content of the other expectations The most centralexpectation is the one with the highest mean LSA simi-

larity to the remaining uncovered expectations We nor-mally weight these three principles in tests of AutoTutorbut it is possible to alter these weights by simply chang-ing three parameters Changes in these parameters aswell as in others (eg the LSA threshold T ) end up gen-erating very different conversations

EVALUATIONS OF AUTOTUTOR

Different types of performance evaluation can be madein an assessment of the success of AutoTutor One type istechnical and will not be addressed in depth in this articleThis type evaluates whether particular computationalmodules of AutoTutor are producing output that is validand satisfies the intended specifications For examplewe previously reported data on the accuracy of LSA andthe speech act classifier in comparison with the accuracyof human judges A second type of evaluation assessesthe quality of the dialogue moves produced by AutoTutorThat is it addresses the extent to which AutoTutorrsquos di-alogue moves are coherent relevant and smooth A thirdtype of evaluation assesses whether AutoTutor is success-ful in producing learning gains A fourth assesses the ex-tent to which learners like interacting with AutoTutor Inthis section we present what we know so far about thesecond and third types of evaluation

Expert judges have evaluated AutoTutor with respectto conversational smoothness and the pedagogical qualityof its dialogue moves (Person Graesser Kreuz Pomeroyamp the TRG 2001) The expertsrsquo mean ratings were pos-itive (ie smooth rather than awkward conversation goodrather than bad pedagogical quality) but there is roomfor improvement in the naturalness and pedagogical ef-fectiveness of the dialogue In more recent studies a by-stander Turing test has been performed on the natural-ness of AutoTutorrsquos dialogue moves (Person Graesseramp the TRG 2002) In these studies we randomly selected144 tutor moves in the tutorial dialogues between studentsand AutoTutor Six human tutors (from the computer liter-acy tutor pool at the University of Memphis) were askedto fill in what they would say at these 144 points Thusat each of these 144 tutor turns the corpus containedwhat the human tutors generated and what AutoTutorgenerated A group of computer literacy students wasasked to discriminate between dialogue moves generatedby a human versus those generated by a computer halfin fact were by humans and half were by computer Itwas found that the bystander students were unable to dis-criminate whether particular dialogue moves had beengenerated by a computer or by a human the dprime discrim-ination scores were near zero

The results of the bystander Turing test presented aboveare a rather impressive outcome that is compatible withthe claim that AutoTutor is a good simulation of humantutors AutoTutor manages to have productive and rea-sonably smooth conversations without achieving a com-plete and deep understanding of what the student ex-presses There is an alternative interpretation however

188 GRAESSER ET AL

which is just as interesting Perhaps tutorial dialogue isnot highly constrained so the tutor has great latitude inwhat can be said without disrupting the conversation Inessence there is a large landscape of options regardingwhat the tutor can say at most points in the dialogue Ifthis is the case then it truly is feasible to develop tutor-ing technologies around NLD These conversations areflexible and resilient not fragile

AutoTutor has been evaluated on learning gains inseveral experiments on the topics of computer literacy(Graesser Moreno et al 2003 Person Graesser BautistaMathews amp the TRG 2001) and conceptual physics(Graesser Jackson et al 2003 VanLehn amp Graesser2002) In most of the studies a pretest is administeredfollowed by a tutoring treatment followed by a posttestIn some experiments there is a posttest-only design andno pretest In the tutoring treatments AutoTutorrsquos scoresare compared with scores of different types of compari-son conditions The comparison conditions vary fromexperiment to experiment because colleagues have haddifferent views on what a suitable control condition wouldbe AutoTutor posttest scores have been compared with(1) pretest scores ( pretest) (2) read-nothing scores (readnothing) (3) scores after relevant chapters from the coursetextbook are read (textbook) (4) same as (3) except thatcontent is included only if it is directly relevant to thecontent during training by AutoTutor (textbook reduced)and (5) scores after the student reads text prepared by theexperimenters that succinctly describes the content cov-ered in the curriculum script of AutoTutor (script content)The dependent measures were different for computer lit-

eracy and physics so the two sets of studies will be dis-cussed separately

Table 1 presents data on three experiments on computerliteracy The data in Table 1 constitute a reanalysis ofstudies reported in two published conference proceed-ings (Graesser Moreno et al 2003 Person GraesserBautista et al 2001) The numbers of students run inExperiments 1 2 and 3 were 36 24 and 81 respectivelyThe students learned about hardware operating systemsand the Internet by collaboratively answering 12 questionswith AutoTutor or being assigned to the read-nothing orthe textbook condition Person Graesser Bautista et alused a repeated measures design counterbalancing train-ing condition (AutoTutor textbook and read nothing)against the three topic areas (hardware operating sys-tems and Internet) The students were given three typesof tests The shallow test consisted of multiple-choicequestions that were randomly selected from a test bankassociated with the textbook chapters in the computer lit-eracy course All of these questions were classified asshallow questions according to Bloomrsquos (1956) taxon-omy Experts on computer literacy constructed the deepquestions associated with the textbook chapters thesewere multiple-choice questions that tapped causal men-tal models and deep reasoning It should be noted thatthese shallow and deep multiple-choice questions wereprepared by individuals who were not aware of the con-tent of AutoTutor They were written to cover content inthe textbook on computer literacy In contrast the clozetask was tailored to AutoTutor training The ideal an-swers to the questions during training were presented on

Table 1Results of AutoTutor Experiments on Computer Literacy

Test

Shallow Deep Cloze

Pretest Posttest Pretest Posttest Posttest

Experiments M SD M SD M SD M SD M SD

Experiment 1a

AutoTutor ndash 597 22 ndash 547 27 383 15Textbook ndash 606 21 ndash 515 22 331 15Read nothing ndash 565 26 ndash 476 25 295 14Effect sizet 004 015 035Effect sizen 012 028 063

Experiment 2a


Experiment 3b

AutoTutor 541 26 520 26 383 15 496 16 322 19Textbook reduced 561 23 539 24 388 16 443 17 254 15Read nothing 523 21 515 25 379 16 360 14 241 16Effect sizet 008 031 045Effect sizen 002 097 051Effect sizep 008 075 ndash

NotemdashEffect sizet effect size using the textbook comparison group effect sizen effect size using the read-nothing com-parison group effect sizep effect size using the pretest aReanalysis of data reported in Person Graesser Bautista Math-ews and the Tutoring Research Group (2001) bReanalysis of data reported in Graesser Moreno et al (2003)

AUTOTUTOR 189

the cloze test with four content words deleted for eachanswer The studentrsquos task was to fill in the missing con-tent words The proportion of questions answered cor-rectly served as the metric for the shallow deep andcloze tasks The Graesser Moreno et al study had thesame design except that a pretest was administered therewas an expanded set of deep multiple-choice questionsand a textbook-reduced comparison condition was usedinstead of the textbook condition

In Table 1 the means standard deviations and effectsizes in standard deviation units are reported The effectsizes revealed that AutoTutor did not facilitate learningon the shallow multiple-choice test questions that hadbeen prepared by the writers of the test bank for the text-book All of these effect sizes were low or negative (meaneffect size was 05 for the seven values in Table 1) Shal-low knowledge is not the sort of knowledge that AutoTutorwas designed to deal with so this result is not surprisingAutoTutor was built to facilitate deep reasoning and thiswas apparent in the effect sizes for the deep multiple-choice questions All seven of these effect sizes in Table 1were positive with a mean of 49 Similarly all six of theeffect sizes for the cloze tests were positive with a meanof 50 These effect sizes were generally larger when thecomparison condition was read nothing (M 43 fornine effect sizes in Table 1) than when the comparisoncondition was the textbook or textbook-reduced condition(M 22) These means are in the same ball park as humantutoring which has shown an effect size of 42 in com-parison with classroom controls in the meta-analysis ofCohen et al (1982) The control conditions in Cohenet alrsquos meta-analyses are analogous to the read-nothingcondition in the present experiments since the partici-pants in the present study all had classroom experiencewith computer literacy

Table 2 shows results of an experiment on conceptualphysics that was reported in a published conference pro-ceeding (Graesser Jackson et al 2003) The partici-pants were given a pretest completed training and weregiven a posttest The conditions were AutoTutor text-book and read nothing The two tests tapped deepercomprehension and consisted of either multiple-choicequestions or conceptual physics problems that required

essay answers (which were graded by PhDs in physics)All six of the effect sizes in Table 2 were positive (M 71) Additional experiments are described by VanLehnand Graesser (2002) who administered the same testswith a variety of control conditions When all of the con-ceptual physics studies to date as well as the multiple-choice and essay tests are taken into account the meaneffect sizes of AutoTutor have varied when contrastedwith particular comparison conditions read nothing(67) pretest (82) textbook (82) script content (07)and human tutoring in computer-mediated conversation(08)

There are a number of noteworthy outcomes of theanalyses presented above First AutoTutor is effective inpromoting learning gains at deep levels of comprehen-sion in comparison with the typical ecologically validsituation in which students (all too often) read nothingstart out at pretest or read the textbook for an amount oftime equivalent to that involved in using AutoTutor Weestimate the effect size as 70 when considering thesecomparison conditions the deeper tests of comprehen-sion and both computer literacy and physics Second itis surprising that reading the textbook is not much dif-ferent than reading nothing It appears that a tutor isneeded to encourage the learner to focus on deeper lev-els of comprehension Third AutoTutor is as effective asa human tutor who communicates with the student overterminals in computer-mediated conversation The humantutors had doctoral degrees in physics and extensive ex-perience within a learning setting yet they did not out-perform AutoTutor Such a result clearly must be repli-cated before we can consider it to be a well-establishedfinding However the early news is quite provocativeFourth the impact of AutoTutor on learning gains is con-siderably reduced when a comparison is made with read-ing of text that is carefully tailored to exactly match thecontent covered by AutoTutor That is textbook-reducedand script-content controls yielded an AutoTutor effectsize of only 22 when the subject matters of computer lit-eracy and physics were combined Of course in the realworld texts are rarely crafted to copy tutoring content sothe status of this control is uncertain in the arena of prac-tice But it does suggest that the impact of AutoTutor is

Table 2Results of AutoTutor Experimenta on Conceptual Physics

Multiple Choice Physics Essays

Pretest Posttest Pretest Posttest

M SD M SD M SD M SD

AutoTutor 597 17 725 15 423 30 575 26Textbook 566 13 586 11 478 25 483 25Read nothing 633 17 632 15 445 28 396 25Effect sizet 126 37Effect sizen 62 72Effect sizep 75 51

NotemdashEffect sizet effect size using the textbook comparison group effect sizen effect size using theread-nothing comparison group effect sizep effect size using the pretest aReanalysis of data reported inGraesser Jackson et al (2003)

190 GRAESSER ET AL

dramatically reduced or disappears when there are com-parison conditions that present content with informationequivalent to that of AutoTutor

What it is about AutoTutor that facilitates learning re-mains an open question Is it the dialogue content or theanimated agent that accounts for the learning gainsWhat role do motivation and emotions play over andabove the cognitive components We suspect that the an-imated conversational agent will fascinate some studentsand possibly enhance motivation Learning environmentshave only recently had animated conversational agentswith facial features synchronized with speech and insome cases appropriate gestures (Cassell amp Thorisson1999 Johnson Rickel amp Lester 2000 Massaro amp Cohen1995) Many students will be fascinated with an agentthat controls the eyes eyebrows mouth lips teeth tonguecheekbones and other parts of the face in a fashion thatis meshed appropriately with the language and emotionsof the speaker (Picard 1997) The agents provide an an-thropomorphic humanndashcomputer interface that simu-lates a conversation with a human This will be excitingto some frightening to a few annoying to others and soon There is some evidence that these agents tend to havea positive impact on learning or on the learnerrsquos percep-tions of the learning experience in comparison with speechalone or text controls (Atkinson 2002 Moreno MayerSpires amp Lester 2001 Whittaker 2003) However addi-tional research is needed to determine the precise condi-tions agent features and levels of representation that areassociated with learning gains According to GraesserMoreno et al (2003) it is the dialogue content and notthe speech or animated facial display that influenceslearning whereas the animated agent can have an influ-ential role (positive neutral or negative) in motivationOne rather provocative result is that there is a near-zerocorrelation between learning gains and how much thestudents like the conversational agents (Moreno KlettkeNibbaragandla Graesser amp the TRG 2002) Thereforeit is important to distinguish liking from learning in thisarea of research Although the jury may still be out onexactly what it is about AutoTutor that leads to learninggains the fact is that students learn from the intelligenttutoring system and some enjoy having conversationswith AutoTutor in natural language

REFERENCES

Aleven V amp Koedinger K R (2002) An effective metacognitivestrategy Learning by doing and explaining with a computer-basedcognitive tutor Cognitive Science 26 147-179

Allen J (1995) Natural language understanding Redwood City CABenjaminCummings

Anderson J R Corbett A T Koedinger K R amp Pelletier R(1995) Cognitive tutors Lessons learned Journal of the LearningSciences 4 167-207

Atkinson R K (2002) Optimizing learning from examples using an-imated pedagogical agents Journal of Educational Psychology 94416-427

Bloom B S (Ed) (1956) Taxonomy of educational objectives Theclassification of educational goals Handbook I Cognitive domainNew York Longmans Green

Burgess C Livesay K amp Lund K (1998) Explorations in context

space Words sentences and discourse Discourse Processes 25211-257

Cassell J amp Thorisson K (1999) The power of a nod and a glanceEnvelope vs emotional feedback in animated conversational agentsApplied Artificial Intelligence 13 519-538

Chi M T H de Leeuw N Chiu M amp LaVancher C (1994) Elic-iting self-explanations improves understanding Cognitive Science18 439-477

Chi M T H Siler S A Jeong H Yamauchi T amp HausmannR G (2001) Learning from human tutoring Cognitive Science 25471-533

Cohen P A Kulik J A amp Kulik C C (1982) Educational out-comes of tutoring A meta-analysis of findings American Educa-tional Research Journal 19 237-248

Colby K M Weber S amp Hilf F D (1971) Artificial paranoia Ar-tificial Intelligence 2 1-25

Collins A Brown J S amp Newman S E (1989) Cognitive appren-ticeship Teaching the craft of reading writing and mathematics InL B Resnick (Ed) Knowing learning and instruction Essays inhonor of Robert Glaser (pp 453-494) Hillsdale NJ Erlbaum

Collins A Warnock E H amp Passafiume J J (1975) Analysis andsynthesis of tutorial dialogues In G H Bower (Ed) The psychologyof learning and motivation Advances in research and theory (Vol 8pp 49-87) New York Academic Press

DARPA (1995) Proceedings of the Sixth Message Understanding Con-ference (MUC-6) San Francisco Morgan Kaufman

Foltz P W Gilliam S amp Kendall S (2000) Supporting content-based feedback in on-line writing evaluation with LSA InteractiveLearning Environments 8 111-128

Fox B (1993) The human tutorial dialogue project Hillsdale NJErlbaum

Graesser A C Gernsbacher M A amp Goldman S (Eds) (2003)Handbook of discourse processes Mahwah NJ Erlbaum

Graesser A C Hu X amp McNamara D S (in press) Computerizedlearning environments that incorporate research in discourse psy-chology cognitive science and computational linguistics In A FHealy (Ed) Experimental cognitive psychology and its applicationsFestschrift in honor of Lyle Bourne Walter Kintsch and Thomas Lan-dauer Washington DC American Psychological Association

Graesser A C Jackson G T Mathews E C Mitchell H HOlney A Ventura M amp Chipman P (2003) WhyAutoTutor Atest of learning gains from a physics tutor with natural language dia-log In R Alterman amp D Hirsh (Eds) Proceedings of the Twenty-Fifth Annual Conference of the Cognitive Science Society (pp 1-6)Mahwah NJ Erlbaum

Graesser A C Moreno K Marineau J Adcock A Olney APerson N amp the Tutoring Research Group (2003) AutoTutorimproves deep learning of computer literacy Is it the dialogue or thetalking head In U Hoppe F Verdejo amp J Kay (Eds) Proceedingsof artificial intelligence in education (pp 47-54) Amsterdam IOSPress

Graesser A C amp Olde B A (2003) How does one know whethera person understands a device The quality of the questions the per-son asks when the device breaks down Journal of Educational Psy-chology 95 524-536

Graesser A C amp Person N K (1994) Question asking during tutor-ing American Educational Research Journal 31 104-137

Graesser A C Person N K Harter D amp the Tutoring Re-search Group (2001) Teaching tactics and dialogue in AutoTutorInternational Journal of Artificial Intelligence in Education 12 257-279

Graesser A C Person N K amp Magliano J P (1995) Collabora-tive dialogue patterns in naturalistic one-on-one tutoring AppliedCognitive Psychology 9 359-387

Graesser A C VanLehn K Roseacute C P Jordan P W amp Harter D(2001) Intelligent tutoring systems with conversational dialogue AIMagazine 22(4) 39-52

Graesser A C Wiemer-Hastings K Wiemer-Hastings PKreuz R amp the Tutoring Research Group (1999) AutoTutorA simulation of a human tutor Journal of Cognitive Systems Re-search 1 35-51

Graesser A C Wiemer-Hastings P Wiemer-Hastings K Har-

AUTOTUTOR 191

ter D Person N K amp the Tutoring Research Group (2000)Using latent semantic analysis to evaluate the contributions of stu-dents in AutoTutor Interactive Learning Environments 8 129-148

Gratch J Rickel J Andreacute E Cassell J Petajan E amp Badler N(2002) Creating interactive virtual humans Some assembly re-quired IEEE Intelligent Systems 17 54-63

Harabagiu S M Maiorano S J amp Pasca M A (2002) Open-domain question answering techniques Natural Language Engi-neering 1 1-38

Heffernan N T amp Koedinger K R (1998) A developmentalmodel for algebra symbolization The results of a difficulty factor as-sessment In M A Gernsbacher amp S J Derry (Eds) Proceedingsof the Twentieth Annual Conference of the Cognitive Science Society(pp 484-489) Mahwah NJ Erlbaum

Hu X Cai Z Graesser A C Louwerse M M Penumatsa POlney A amp the Tutoring Research Group (2003) An improvedLSA algorithm to evaluate student contributions in tutoring dialogueIn G Gottlob amp T Walsh (Eds) Proceedings of the Eighteenth In-ternational Joint Conference on Artificial Intelligence (pp 1489-1491) San Francisco Morgan Kaufmann

Hume G D Michael J A Rovick A amp Evens M W (1996)Hinting as a tactic in one-on-one tutoring Journal of the LearningSciences 5 23-47

Johnson W L Rickel J W amp Lester J C (2000) Animated ped-agogical agents Face-to-face interaction in interactive learning envi-ronments International Journal of Artificial Intelligence in Educa-tion 11 47-78

Jurafsky D amp Martin J H (2000) Speech and language process-ing An introduction to natural language processing computationallinguistics and speech recognition Upper Saddle River NJ Pren-tice Hall

Kintsch E Steinhart D Stahl G amp the LSA Research Group(2000) Developing summarization skills through the use of LSA-based feedback Interactive Learning Environments 8 87-109

Kintsch W (1998) Comprehension A paradigm for cognition NewYork Cambridge University Press

Landauer T K amp Dumais S T (1997) An answer to Platorsquos problemThe latent semantic analysis theory of acquisition induction andrepresentation of knowledge Psychological Review 104 211-240

Landauer T K Foltz P W amp Laham D (1998) An introductionto latent semantic analysis Discourse Processes 25 259-284

Lehnert W G amp Ringle M H (Eds) (1982) Strategies for naturallanguage processing Hillsdale NJ Erlbaum

Lenat D B (1995) CYC A large-scale investment in knowledge in-frastructure Communications of the ACM 38 33-38

Louwerse M M Graesser A C Olney A amp the Tutoring Re-search Group (2002) Good computational manners Mixed-initiativedialogue in conversational agents In C Miller (Ed) Etiquette forhumanndashcomputer work Papers from the 2002 fall symposium Techni-cal Report FS-02-02 (pp 71-76) North Falmouth MA AAAI Press

Louwerse M M amp Mitchell H H (2003) Towards a taxonomy ofa set of discourse markers in dialog A theoretical and computationallinguistic account Discourse Processes 35 199-239

Louwerse M M amp van Peer W (Eds) (2002) Thematics Inter-disciplinary studies Philadelphia John Benjamins

Manning C D amp Schuumltze H (1999) Foundations of statistical nat-ural language processing Cambridge MA MIT Press

Massaro D W amp Cohen M M (1995) Perceiving talking facesCurrent Directions in Psychological Science 4 104-109

Moore J D (1995) Participating in explanatory dialogues Cam-bridge MA MIT Press

Moore J D amp Wiemer-Hastings P (2003) Discourse in computa-tional linguistics and artificial intelligence In A C Graesser M AGernsbacher amp S R Goldman (Eds) Handbook of discourse pro-cesses (pp 439-486) Mahwah NJ Erlbaum

Moreno K N Klettke B Nibbaragandla K Graesser A Camp the Tutoring Research Group (2002) Perceived characteristicsand pedagogical efficacy of animated conversational agents In S ACerri G Gouarderes amp F Paraguacu (Eds) Proceedings of the SixthInternational Conference on Intelligent Tutoring Systems (pp 963-971) Berlin Springer-Verlag

Moreno R Mayer R E Spires H A amp Lester J C (2001) Thecase for social agency in computer-based teaching Do students learnmore deeply when they interact with animated pedagogical agentsCognition amp Instruction 19 177-213

Norman D A amp Rumelhart D E (1975) Explorations in cogni-tion San Francisco Freeman

Olde B A Franceschetti D R Karnavat A Graesser A Camp the Tutoring Research Group (2002) The right stuff Do youneed to sanitize your corpus when using latent semantic analysis InW D Gray amp C D Schunn (Eds) Proceedings of the Twenty-FourthAnnual Meeting of the Cognitive Science Society (pp 708-713) Mah-wah NJ Erlbaum

Olney A Louwerse M M Mathews E C Marineau JMitchell H H amp Graesser A C (2003) Utterance classificationin AutoTutor In J Burstein amp C Leacock (Eds) Building educationalapplications using natural language processing Proceedings of theHuman Language Technology North American chapter of the Asso-ciation for Computational Linguistics Conference 2003 Workshop(pp 1-8) Philadelphia Association for Computational Linguistics

Palincsar A S amp Brown A (1984) Reciprocal teaching ofcomprehension-fostering and comprehension-monitoring activitiesCognition amp Instruction 1 117-175

Person N K Graesser A C Bautista L Mathews E C amp theTutoring Research Group (2001) Evaluating student learninggains in two versions of AutoTutor In J D Moore C L Redfield ampW L Johnson (Eds) Artificial intelligence in education AI-ED inthe wired and wireless future (pp 286-293) Amsterdam IOS Press

Person N K Graesser A C Kreuz R J Pomeroy V amp the Tu-toring Research Group (2001) Simulating human tutor dialoguemoves in AutoTutor International Journal of Artificial Intelligencein Education 12 23-39

Person N K Graesser A C amp the Tutoring Research Group(2002) Human or computer AutoTutor in a bystander Turing test InS A Cerri G Gouarderes amp F Paraguacu (Eds) Proceedings of theSixth International Conference on Intelligent Tutoring Systems(pp 821-830) Berlin Springer-Verlag

Picard R W (1997) Affective computing Cambridge MA MIT PressRich C amp Sidner C L (1998) COLLAGEN A collaborative man-

ager for software interface agents User Modeling amp User-AdaptedInteraction 8 315-350

Rickel J Lesh N Rich C Sidner C L amp Gertner A S (2002)Collaborative discourse theory as a foundation for tutorial dialogueIn S A Cerri G Gouarderes amp F Paraguacu (Eds) Proceedings ofthe Sixth International Conference on Intelligent Tutoring Systems(pp 542-551) Berlin Springer-Verlag

Schank R C amp Riesbeck C K (Eds) (1982) Inside computer under-standing Five Programs Plus Miniatures Hillsdale NJ Erlbaum

Sekine S amp Grishman R (1995) A corpus-based probabilisticgrammar with only two non-terminals In H Bunt (Ed) Fourth in-ternational workshop on parsing technology (pp 216-223) PragueAssociation for Computational Linguistics

Shah F Evens M W Michael J amp Rovick A (2002) Classifyingstudent initiatives and tutor responses in human keyboard-to-keyboardtutoring sessions Discourse Processes 33 23-52

Sleeman D amp Brown J S (Eds) (1982) Intelligent tutoring sys-tems New York Academic Press

Song K Hu X Olney A Graesser A C amp the Tutoring Re-search Group (in press) A framework of synthesizing tutoring con-versation capability with Web based distance education coursewareComputers amp Education

Susarla S Adcock A Van Eck R Moreno K Graesser A Camp the Tutoring Research Group (2003) Development and evalua-tion of a lesson authoring tool for AutoTutor In V Aleven U HoppeJ Kay R Mizoguchi H Pain F Verdejo amp K Yacef (Eds) AIED2003supplemental proceedings (pp 378-387) Sydney University of Syd-ney School of Information Technologies

VanLehn K amp Graesser A C (2002) Why2 report Evaluation ofWhyAtlas WhyAutoTutor and accomplished human tutors on learn-ing gains for qualitative physics problems and explanations Unpub-lished report prepared by the University of Pittsburgh CIRCLE groupand the University of Memphis Tutoring Research Group

192 GRAESSER ET AL

VanLehn K Jones R M amp Chi M T H (1992) A model of theself-explanation effect Journal of the Learning Sciences 2 1-60

VanLehn K Jordan P Roseacute C P Bhembe D Bottner MGaydos A Makatchev M Pappuswamy U Ringenberg MRoque A Siler S amp Srivastava R (2002) The architecture ofWhy2-Atlas A coach for qualitative physics essay writing In S ACerri G Gouarderes amp F Paraguacu (Eds) Proceedings of the SixthInternational Conference on Intelligent Tutoring Systems (pp 158-167) Berlin Springer-Verlag

VanLehn K Lynch C Taylor L Weinstein A Shelby RSchulze K Treacy D amp Wintersgill M (2002) Minimallyinvasive tutoring of complex physics problem solving In S A CerriG Gouarderes amp F Paraguacu (Eds) Proceedings of the Sixth In-ternational Conference on Intelligent Tutoring Systems (pp 367-376) Berlin Springer-Verlag

Voorhees E M (2001) The TREC question answering track NaturalLanguage Engineering 7 361-378

Weizenbaum J (1966) ELIZAmdashA computer program for the study ofnatural language communication between man and machine Com-munications of the ACM 9 36-45

Whittaker S (2003) Theories and methods in mediated communi-cation In A C Graesser M A Gernsbacher amp S R Goldman(Eds) Handbook of discourse processes (pp 243-286) MahwahNJ Erlbaum

Winograd T (1972) Understanding natural language New YorkAcademic Press

Woods W A (1977) Lunar rocks in natural English Explorations innatural language question answering In A Zampoli (Ed) Linguis-tic structures processing (pp 201-222) New York Elsevier

(Manuscript received January 25 2004accepted for publication March 14 2004)

AUTOTUTOR 181

lenges were (1) the inherent complexities of natural lan-guage processing (2) the unconstrained open-ended na-ture of world knowledge and (3) the lack of research onlengthy threads of connected discourse In retrospectthis extreme pessimism about discourse and natural lan-guage technologies was arguably premature A sufficientnumber of technical advances has been made in the lastdecade for researchers to revisit the vision of buildingdialogue systems

The primary technical breakthroughs came from thefields of computational linguistics information retrievalcognitive science artificial intelligence and discourseprocesses For example the field of computational lin-guistics has produced an impressive array of lexiconssyntactic parsers semantic interpretation modules anddialogue analyzers that are capable of rapidly extractinginformation from naturalistic text for information retrievalmachine translation and speech recognition (Allen 1995DARPA 1995 Harabagiu Maiorano amp Pasca 2002Jurafsky amp Martin 2000 Manning amp Schuumltze 1999Voorhees 2001) These advancements in computationallinguistics represent world knowledge symbolically sta-tistically or through a hybrid of these two foundationsFor instance Lenatrsquos (1995) CYC system represents alarge volume of mundane world knowledge in symbolicforms that can be integrated with a diverse set of pro-cessing architectures The world knowledge contained inan encyclopedia can be represented statistically in high-dimensional spaces such as latent semantic analysis(LSA Foltz Gilliam amp Kendall 2000 Landauer Foltzamp Laham 1998) and Hyperspace Analogue to Language(HAL Burgess Livesay amp Lund 1998) An LSA spaceprovides the backbone for statistical metrics on whethertwo text excerpts are conceptually similar the reliabilityof these similarity metrics has been found to be equiva-lent to that of human judgments The representation andprocessing of connected discourse is much less mysteri-ous after two decades of research in discourse process-ing (Graesser Gernsbacher amp Goldman 2003) and rel-evant interdisciplinary research (Louwerse amp van Peer2002) There are now generic computational modules forbuilding dialogue facilities that track and manage the be-liefs knowledge intentions goals and attention statesof agents in two-party dialogues (Graesser VanLehnRoseacute Jordan amp Harter 2001 Gratch et al 2002 Mooreamp Wiemer-Hastings 2003 Rich amp Sidner 1998 RickelLesh Rich Sidner amp Gertner 2002)

WHEN ARE NATURAL LANGUAGEDIALOGUE FACILITIES FEASIBLE

Natural language dialogue (NLD) facilities are ex-pected to do a reasonable job in some conversational con-texts but not in others Success depends on the subjectmatter the knowledge of the learner the expected depthof comprehension and the expected sophistication of thedialogue strategies We do not believe that current NLDfacilities will be impressive when the subject matter re-

quires mathematical or analytical precision when theknowledge level of the learner is high and when the userwould like to converse with a humorous witty or illu-minating partner For example an NLD facility wouldnot be well suited to an eCommerce application that man-ages precise budgets that a user carefully tracks AnNLD facility would not be good for an application inwhich the dialogue system must simulate a good spouseparent comedian or confidant An NLD facility is morefeasible in applications that involve imprecise verbalcontent a low to medium level of user knowledge abouta topic and earnest literal replies

We are convinced that tutoring environments are feasi-ble NLD applications when the subject matter is verbal andqualitative NLD tutors have been attempted for mathe-matics with limited success (Heffernan amp Koedinger1998) whereas those in qualitative domains have shownmore promise (Graesser VanLehn et al 2001 Rickelet al 2002 VanLehn Jordan et al 2002) TutorialNLD is feasible when the shared knowledge (ie com-mon ground) between the tutor and the learner is low ormoderate rather than high If the common ground ishigh then both dialogue participants (ie the computertutor and the learner) will be expecting a more precisedegree of mutual understanding and therefore will runa higher risk of failing to meet the otherrsquos expectations

It is noteworthy that human tutors are not able to mon-itor the knowledge of students at a fine-grained level be-cause much of what students express is vague under-specified ambiguous fragmentary and error ridden (Fox1993 Graesser amp Person 1994 Graesser Person ampMagliano 1995 Shah Evens Michael amp Rovick 2002)There are potential costs if a tutor attempts to do so Forexample it is often more worthwhile for the tutor to helpbuild new correct knowledge than to become boggeddown in dissecting and correcting each of the learnerrsquosknowledge deficits Tutors do have an approximate senseof what a student knows and this appears to be sufficientto provide productive dialogue moves that lead to signif-icant learning gains in the student (Chi Siler Jeong Ya-mauchi amp Hausmann 2001 Cohen Kulik amp Kulik1982 Graesser et al 1995) These considerations moti-vated the design of AutoTutor (Graesser Person Harteramp the Tutoring Research Group (TRG) 2001 GraesserVanLehn et al 2001 Graesser K Wiemer-HastingsP Wiemer-Hastings Kreuz amp the TRG 1999) whichwill be described in the next section The central as-sumption in a nutshell is that dialogue can be usefulwhen it advances the dialogue and learning agenda evenwhen the tutor does not fully understand a student

Tutorial NLD appears to be a more feasible technologyto the extent that the tutoring strategies unlike strategiesthat are highly sophisticated follow what most humantutors do Most human tutors anticipate particular correctanswers (called expectations) and particular misunder-standings (misconceptions) when they ask the learnerquestions and trace the learnerrsquos reasoning As the learnerarticulates the answer or solves the problem this content

182 GRAESSER ET AL





AUTOTUTOR



AUTOTUTOR 183






184 GRAESSER ET AL






Packet B

Packet A

Pac

ket A

Packet A

Packet B

Packet B


AUTOTUTOR 185









186 GRAESSER ET AL














AUTOTUTOR 187










188 GRAESSER ET AL






Test

Shallow Deep Cloze



Experiment 1a


Experiment 2a


Experiment 3b



AUTOTUTOR 189









M SD M SD M SD M SD



190 GRAESSER ET AL



REFERENCES





























AUTOTUTOR 191









































192 GRAESSER ET AL










182 GRAESSER ET AL





AUTOTUTOR



AUTOTUTOR 183






184 GRAESSER ET AL






Packet B

Packet A

Pac

ket A

Packet A

Packet B

Packet B


AUTOTUTOR 185









186 GRAESSER ET AL














AUTOTUTOR 187










188 GRAESSER ET AL






Test

Shallow Deep Cloze



Experiment 1a


Experiment 2a


Experiment 3b



AUTOTUTOR 189









M SD M SD M SD M SD



190 GRAESSER ET AL



REFERENCES





























AUTOTUTOR 191









































192 GRAESSER ET AL










AUTOTUTOR 183






184 GRAESSER ET AL






Packet B

Packet A

Pac

ket A

Packet A

Packet B

Packet B


AUTOTUTOR 185









186 GRAESSER ET AL














AUTOTUTOR 187










188 GRAESSER ET AL






Test

Shallow Deep Cloze



Experiment 1a


Experiment 2a


Experiment 3b



AUTOTUTOR 189









M SD M SD M SD M SD



190 GRAESSER ET AL



REFERENCES





























AUTOTUTOR 191









































192 GRAESSER ET AL










184 GRAESSER ET AL






Packet B

Packet A

Pac

ket A

Packet A

Packet B

Packet B


AUTOTUTOR 185









186 GRAESSER ET AL














AUTOTUTOR 187










188 GRAESSER ET AL






Test

Shallow Deep Cloze



Experiment 1a


Experiment 2a


Experiment 3b



AUTOTUTOR 189









M SD M SD M SD M SD



190 GRAESSER ET AL



REFERENCES





























AUTOTUTOR 191









































192 GRAESSER ET AL










AUTOTUTOR 185









186 GRAESSER ET AL














AUTOTUTOR 187










188 GRAESSER ET AL






Test

Shallow Deep Cloze



Experiment 1a


Experiment 2a


Experiment 3b



AUTOTUTOR 189









M SD M SD M SD M SD



190 GRAESSER ET AL



REFERENCES





























AUTOTUTOR 191









































192 GRAESSER ET AL










186 GRAESSER ET AL














AUTOTUTOR 187










188 GRAESSER ET AL






Test

Shallow Deep Cloze



Experiment 1a


Experiment 2a


Experiment 3b



AUTOTUTOR 189









M SD M SD M SD M SD



190 GRAESSER ET AL



REFERENCES





























AUTOTUTOR 191









































192 GRAESSER ET AL










AUTOTUTOR 187










188 GRAESSER ET AL






Test

Shallow Deep Cloze



Experiment 1a


Experiment 2a


Experiment 3b



AUTOTUTOR 189









M SD M SD M SD M SD



190 GRAESSER ET AL



REFERENCES





























AUTOTUTOR 191









































192 GRAESSER ET AL










188 GRAESSER ET AL






Test

Shallow Deep Cloze



Experiment 1a


Experiment 2a


Experiment 3b



AUTOTUTOR 189









M SD M SD M SD M SD



190 GRAESSER ET AL



REFERENCES





























AUTOTUTOR 191









































192 GRAESSER ET AL










AUTOTUTOR 189









M SD M SD M SD M SD



190 GRAESSER ET AL



REFERENCES





























AUTOTUTOR 191









































192 GRAESSER ET AL










190 GRAESSER ET AL



REFERENCES





























AUTOTUTOR 191









































192 GRAESSER ET AL










AUTOTUTOR 191









































192 GRAESSER ET AL










192 GRAESSER ET AL










Documents

AutoTutor: A tutor with dialogue in natural languagekvanlehn/ITScourse2009/Readings... · 2009-04-22 · edge-based reasoning when solving particular problems. Why/Atlas (VanLehn,