POSTECH Dialog-Based Computer Assisted Language Learning System Intelligent Software Lab. POSTECH Prof. Gary Geunbae Lee

Contents Introduction Methods DB-CALL System Example-based Dialog Modeling Feedback Generation Translation Assistance Comprehension Assistance Language Learner Simulation User Simulation Grammar Error Simulation Discussion

RESEARCH BACKGROUND Globalization makes English more important as a world language Extremely high cost of native speaker tutors Most language learning software are dedicated to pronunciation practice Dialog-based Computer-assisted Language Learning will be an excellent solution BACKGROUND DB-CALL system should be able to understand students poor and non-n ative expressions DB-CALL system should have high domain scalability to support various practical scenarios DB-CALL system should provide educational functionalities which help s tudents improve their linguistic ability ISSUES

PREVIOUS WORKS ON DB-CALL Lets Go (CMU, 02-04) Providing bus schdule information for CMU Non-native students Adaptation the acoustic model and language model to non- native speakers Edit-distance based corrective feedback

PREVIOUS WORKS ON DB-CALL SPELL (Edinburgh, 05) Restourant Domain Scenario-based virtual space Incorporating mal-rules into the ASR grammar

PREVIOUS WORKS ON DB-CALL DEAL (KTH, 07) Trade Domain Finite State Network-based limited dialog management When leaners get stuck, the system provides hints

POSTECH DB-CALL System Crawler Description Extractor Description Extractor + Parallel Sentence Extractor + ~~~~ ~~~ Expression > Description > Expression > Description > Korean EXP > English EXP > Korean EXP > English EXP Try this expression

DB-CALL System

1. Example-based Dialog Modeling

INTRODUCTION Spoken Dialog System Applications Human-Robot Interface, Telematics, Tutoring,...

PROBLEM & GOAL PROBLEM How to determine the next system action Knowledge-based approach Plan recipe / ISU rule / Agenda Data-driven approach Statistical approach Supervised Learning based on state approximation Reinforcement Learning based on MDP/POMDP Example-based approach GOAL To develop a simple and practical approach to dialog modeling for multi-domain dialog systems

IDEA Dialog State Space Domain = Building_Guidance Dialog Act = WH-QUESTION Main Goal = SEARCH-LOC ROOM-TYPE=1 (filled), ROOM-NAME=0 (unfilled) LOC-FLOOR=0, PER-NAME=0, PER-TITLE=0 Previous Dialog Act =, Previous Main Goal = Discourse History Vector = [1,0,0,0,0] Lexico-semantic Pattern = ROOM_TYPE ? System Action = inform(Floor) Dialog Corpus USER: ? [Dialog Act = WH-QUESTION] [Main Goal = SEARCH-LOC] [ROOM-TYPE = ] SYSTEM: 3 , 2 , . [System Action = inform(Floor)] Turn #1 (Domain=Building_Guidance) Dialog Example Indexed by using semantic & discourse features Having the similar state Lee et al., (2006), A Situation-based Dialogue Management using Dialogue Examples, IEEE ICASSP

ALGORITHM Query Generation Making SQL statement using Discourse History and SLU results. Example Search Trying to search semantically close dialog examples in example DB given the current dialog state. Example Selection Selecting the best example to maximize the utterance similarity measure based on lexical and discourse information. Noisy Input (from ASR/SLU) Example Search Example Selection Query Generation Example DB Content DB Discourse History NLG Relaxation Strategy System Template

EXPERIMENTAL RESULTS Real user evaluation 10 undergraduates Evaluation Metric STR (Success Turn Rate) # of successful turns / # of total turns TCR (Task Completion Rate) # of successful dialogs / # of total dialogs AvgUserTurn Average users turn length per dialog Lee et al., (2009), Example-based Dialog Modelng for Practical Multi-domain Dialog Systems, SPECOM System#DialogsAvgUserTurn STR (%) TCR (%) Car Navigation504.5486.2592.00 Weather Information 504.4689.0194.00 EPG504.5083.9990.00 Chatbot505.6064.31- Multi-domain156.0878.7786.67

EXPERIMENTAL RESULTS Lee et al., (2009), Example-based Dialog Modelng for Practical Multi-domain Dialog Systems, SPECOM SystemExact matchPartial matchNo example Car Navigation50.2244.495.29 Weather Information 69.4925.005.51 EPG58.3337.224.45 Chatbot50.7114.2935.00 Multi-domain69.2324.626.15 Example match rate of each dialog system

ROBUST DIALOG MANAGEMENT PROBLEM How to overcome errors in the real world ROBUST DIALOG MANAGEMENT Error handling Recovering ASR/SLU errors by interacting with the user at the conversational level N-best support Estimating the current state with uncertanity ASRSLUDM Noise reduction Adaptation N-best & lattice & CN Robust parsing Data-driven app. Error handling N-best support +ERROR Lee et al., (2008), Robust management with n-best hypotheses using dialog examples and agenda, ACL

GOAL & IDEA To increase the robustness of EBDM with prior knowledge 1) Error Handling If the system knows what the user will do next Dynamic Help Generation LOCATION OFFICE PHONE NUMBER ROOM ROLE GUIDE FOCUS NODE NEXT_TASK AgendaHelp S: Next, you can do the subtask 1) Asking the room's role, or 2)Asking the office phone number, or 3) Selecting the desired room for navigation. UtterHelp S: Next, you can say 1) What is it?, or 2) Whats the phone number of [ROOM_NAME]?, or 3) Lets go there.

GOAL & IDEA To increase the robustness of EBDM with prior knowledge 2) N-best support If the system knows which subtask will be more probable next Rescoring N-best hypotheses ( h 1 ~h n ) LOCATION OFFICE PHONE NUMBER FLOOR ROOM NAME h2h2 h1h1 h3h3 h4h4 SubtaskSystem UtteranceSystem Action LOCATION The directors room is Room No. 201. Inform(RoomNumber) N-bestUser Utterances Subtas k P(h i |S) U1 (h 1 ) What are office rooms in this building? ROOM NAME 0.2 U2 (h 2 )What is the floor?FLOOR0.4 U3 (h 3 )Where is it?LOCATION0.3 U4 (h 4 ) What is the phone number? OFFICE PHONE NUMBER 0.5 (More probable)

ALGORITHM ASRSLU From User w1w1 w2w2 wnwn u1u1 u2u2 unun EBDM V1V1 V2V2 V3V3 V6V6 V7V7 V4V4 V5V5 V8V8 V9V9 s1s1 s2s2 snsn Discourse Interpretation Focus Stack V1V1 V2V2 Argmax Node Argmax Example am*am* V3V3 V4V4 V6V6 V6V6 e1e1 e2e2 ekek ej*ej* V6V6

EXPERIMENT SET-UP Simulated User Evaluation Test set : 1000 simulated dialogs (

EXPERIMENTAL RESULTS Lee et al., (2009), Hybrid Approach to Robust Dialog Management using Agenda and Dialog Examples, CSL, (Submitted) Legends Methods P-EUsing only Examples P-ERUsing Examples + Recovery P-EAUsing Examples + Agenda Graph P-EARUsing Examples + Agenda Graph + Recovery The average score of different methods

EXPERIMENTAL RESULTS Lee et al., (2009), Hybrid Approach to Robust Dialog Management using Agenda and Dialog Examples, CSL, (Submitted) The average score of the P-EAR system according to n-best size

DEMO VIDEO PC demo

DEMO VIDEO Robot demo

2. Feedback Generation

INTRODUCTION Recast Feedback User Input Tutor: ---------- User: ---------- Tutor: ---------- User: ---------- Tutor: ---------- User: ---------- Tutor: ---------- User: ---------- > Expression > Description > Expression > Description > Korean EXP > English EXP > Korean EXP > English EXP Tutor: What is the purpose of you trip? User: My purpose business Tutor: Sorry, I dont understand. What did you say? User: I am here on business Try this expression I am here on business Clarification Request Recast Feedback Learner Uptake Tutoring Process

INTRODUCTION Expression Suggestion User Input Tutor: ---------- User: ---------- Tutor: ---------- User: ---------- Tutor: ---------- User: ---------- Tutor: ---------- User: ---------- > Expression > Description > Expression > Description > Korean EXP > English EXP > Korean EXP > English EXP Tutor: What is the purpose of you trip? Tutor: Sorry, I cant hear you. User: I am here on business Try this expression I am here on business TIMEOUT Expression Suggestion Learner Uptake Tutoring Process

PROBLEMS How to recognize user intentions despite numerous errors in their utterances The mal-rule based technique used in previous studies doesnt work on low level learners due to multiple errors Some utterances even seem to have a meaning that differs from what they intended to say Intended meaning : When does the bus leave? learners utterance : Which time I have to leave? How to choose appropriate user intentions to suggest when a timeout is expired The system should take into consideration the dialog context as human tutors do Performing Intention-based soft pattern-matching to generate correct feedback

MATHODS Context-aware & Level-specific Intention Recognition Intention-based pattern matching Level 1 Utterance Model Level 1 Data Learners Utterance Dialog State based Model Level 2 Utterance Model Level N Utterance Model Level 2 Data Level N Data Dialog State Learners Intention Example Expresssion DB Example Search Example Expressions Pattern Matching Feedback Intention RecognizerDialog Manager Dialog State Update

EXPERIMENT SET-UP Primitive data set Immigration domain 192 dialogs, 3517 utterances (18.32 utt/dialog) Annotation Manually annotated each utterance with the speakers intention and component slot-values Automatically annotated each utterance with the discourse information

EXPERIMENTAL RESULTS Utterance Model Hybrid Model

EXPERIMENTAL RESULTS Level-spec Hybrid Level-spec Utterance Level-ignore Hybrid Level-ignore Utterance

EXPERIMENTAL RESULTS

Demo: POSTECH DB-CALL initial version 2008 Demo: POSTECH DB-CALL initial version 2008

3. Translation Assistance

Architecture Example format Web Parallel Sentence Example Extraction ESL Dialog system / Other Applications Query Expression Search Engine Interface (function call) ~~~~~~~ ~~~~~~~~ ~~~~~~~~ ~~~~ ~~~~~~ ~~~~~~~ ~~~~~~~~ ~~~~~~~~ ~~~~ ~~~~~~ ~~~~~~~ ~~~~~~~~ ~~~~~~~~ ~~~~ ~~~~~~ Analysis

Building Bilingual Example Word alignment Widely used in Statistical Machine Translation IBM Model 1~5, Symmetrization heuristics Word alignment presents a correspondence of each word/phrase in a given bilingual example Example word alignment ( GIZA++ )

4. Comprehension Assistance

INTRODUCTION ESL pobcast website Expression- description DB Dialog System Description Suggestion System English Expression-Description Example Suggestion System When the user asks for a unfamiliar English expression, the system present its description to help understanding Expression detection Recommend sentence description

INTRODUCTION Expression-Description Pair Extraction System To present the expression example and its description, the system extracts expression- description pair from ESL podcast site PhraseDescription routine test we mean it's a normal, regular test that the doctor runs many, many different times with different patients, not a special test. TreatmentTreatment is another word for what the doctor gives you or does to you to help you.

EXAMPLE [script] [description]

Language Learner Simulation

1. User Simulation

INTRODUCTION User Simulation For Spoken Dialog System Developing `simulated user who can replace real users Application Automated evaluation of Spoken Dialog System Detecting potential flaws Predicting overall behaviors of system Learning dialog strategy in reinforcement learning framework

PROBLEM & GOAL PROBLEM How to model real user User Intention simulation User Surface simulation ASR channel simulation GOAL Natural Simulation Diverse Simulation Controllable Simulation

IDEA User Intention Simulation Jung et al., 2009, Data-driven user simulation for automated evaluation of spoken dialog systems, Computer Speech and Language. Discourse Factors + Knowledge + Events Dialog is sequential behaviors Especially, user intention User Intention simulation should take care of various discourse information User Sys User Sys User Sys

User Intention Simulation - Linear Conditional Random Field model Turn Assumption An user utterance has only one intention U I : User Intention State State=[dialog_act, main_goal, named_entities] DI : Previou Discourse Information System Response + Discourse History UIUI DI UIUI UIUI UIUI Jung et al., 2009, Data-driven user simulation for automated evaluation of spoken dialog systems, Computer Speech and Language.

ALGORITHM Jung et al., 2009, Data-driven user simulation for automated evaluation of spoken dialog systems, Computer Speech and Language.

User Surface Simulation PROBLEM How to generate user surface utterance which express given user intention Approach 2-phase user utterance generation 1-phase : candidate generation 2-phase : rescoring Utterance.. User Utterance Model Simulation Selected Utterance Rescoring 1 - phase 2 -phase

1 phase - Generation Dialog_Act _X_ Main_Goal S1 W1 S2 W2 S3 W3 S4 W4 S5 W5 Structure Tag Transition Emission Prob. Structure Tags : Component Slot Names + Part of Speech Tags S : member of Structure Tags given space W : member of vocabulary given space Generation

2phase - Rescoring PROBLEM Rescoring and Selecting the good utterances Criteria Human-like utterance Natural word transition APPROACH Structure and Word interpolated BLEU score SWB score Notice that Evaluation on system generated utterances on utterance simulation and machine translation shares the same task SWB = * Structure_Sequence_BLEU + (1- )* Word_Sequence_BLEU, where 0 1 We set beta as 0.2 since Korean language is an agglutinative language so that it is relatively free to the structural grammar. Jung et al., 2009, Data-driven user simulation for automated evaluation of spoken dialog systems, Computer Speech and Language.

ALGORITHM Jung et al., 2009, Data-driven user simulation for automated evaluation of spoken dialog systems, Computer Speech and Language.

ASR Channel Simulation PROBLEM How to simulate ASR channel Knowledge-based approach Statistical Approach It is difficult to collect speech data for target domain. WER controllable simulation APPROACH Linguistic Knowledge based simulation Step 1 : Determining error position Step 2 : Generating Error types on error marked words Step 3 : Generating ASR Errors ( Substitution, Deletion, Insertions) Step 4 : Rescoring and selecting erroneous utterance Jung et al., 2009, Data-driven user simulation for automated evaluation of spoken dialog systems, Computer Speech and Language.

Error Type Distribution Determining Error types Based on the results of English Speech Recognition We assume that Korean speech recognition has similar error distribution generally. Greenberg et al., 2000

Error Generation Insertion error Insert random word before the insertion error mark Deletion error Just delete it Substitution Error Based on Sequence Alignment Algorithm Syllable-and Phone-based Alignment Selecting some candidates in a dictionary Dynamic local alignment algorithm : Needleman and Wunsch (1970) Get the similarity score Similarity = * Syllable_Alignment_Score + (1- )* Phoneme_Alignment_Score, where 0 1 Vowel Confusion Matrix example

EXPERIMENT SET-UP Korean Car navigation Dialog system SLU : Jeong and Lee (2006) DM : Lee et al. (2009) Word Error Rate : 0.0 ~ 0.4 5000 dialog samples at each WER setting

Intention Jung et al., 2009, Data-driven user simulation for automated evaluation of spoken dialog systems, Computer Speech and Language.

D-BLEU ( Discourse BLEU) is a metric for measuring naturalness of simulated dialogs in the sense of n-gram precision based on BLEU metric calculation. Intention Jung et al., 2009, Data-driven user simulation for automated evaluation of spoken dialog systems, Computer Speech and Language.

Utterance Jung et al., 2009, Data-driven user simulation for automated evaluation of spoken dialog systems, Computer Speech and Language.

ASR channel Jung et al., 2009, Data-driven user simulation for automated evaluation of spoken dialog systems, Computer Speech and Language.

Overall prediction Jung et al., 2009, Data-driven user simulation for automated evaluation of spoken dialog systems, Computer Speech and Language.

2. Grammar Error Simulation

INTRODUCTION Language learner simulation requires us to invent grammar error simulation on top of the general user simulation SLU Dialog Manager System Utterance Generator Dialog System Non-native ASR TTS Grammar Errors Simulator User Utterance Simulator User Intention Simulator ASR Errors Simulator Language Learner Simulator

REALISTIC ERROR He wants to go to a movie theater He wants to to a movie theater He want go to movie theater VS.

PROBLEMS How to incorporate expert knowledge about error characteristics of Korean language learners into the statistical model Subject-verb agreement errors Omission errors of the preposition of prepositional verbs Omission errors of articles Etc.

MARKOV LOGIC NETWORK Sungjin Lee, Gary Geunbae Lee. Realistic grammar error simulation using markov logic. ACL 2009

METHOD The generation procedure involves three steps: Generating probability over error types for each word through MLN inference Determining an error type by sampling the generated probability for each word Creating an ill-formed output sentence by realizing the chosen error types Hewantstogotoamovietheater v_agr_sub prp_lex_del at_del none 0.000 0.921 0.371 0.000 0.449 0.000 0.284 0.000 0.604 0.000 0.866 0.000 0.269 0.000 0.605 0.000 0.355 0.506 0.000 0.781 0.000 0.798 nonev_agr_subprp_lex_delnone at_delnone Hewantgotomovietheater 1 step 2 step 3 step Inference Sampling Realization

EXPERIMENT SET-UP Data Sets NICT JLE Corpus Dividing the 167 error annotated files into 3 level groups: Beginner(1-4) : 2,905 Intermediate(5-6) : 3,296 Advanced(7-9) : 2,752 Evaluation 10-fold cross validations performed for each group The validation results were added together across the rounds

EXPERIMENTAL RESULTS Advanced D KL (Real || Proposed)=0.068 vs. D KL (Real || Baseline)=0.122

EXPERIMENTAL RESULTS Intermediate D KL (Real || Proposed)=0.075 vs. D KL (Real || Baseline)=0.142

EXPERIMENTAL RESULTS Beginner D KL (Real || Proposed)=0.075 vs. D KL (Real || Baseline)=0.092

EXPERIMENTAL RESULTS Human Judgment Evaluated 100 randomly chosen sentences consisting of 50 sent ences each from the real and simulated data The sequence of the test sentences was mixed so that the hum an judges did not know whether the source of the sentence wa s real or simulated Two-level scale (0: Unrealistic, 1: Realistic) Sungjin Lee, Gary Geunbae Lee. Realistic grammar error simulation using markov logic. ACL 2009

Documents

POSTECH Dialog-Based Computer Assisted Language Learning System Intelligent Software Lab. POSTECH Prof. Gary Geunbae Lee