A “k-hypotheses + other” belief updating model Dan Bohus Alex Rudnicky Computer Science Department Carnegie Mellon University Pittsburgh, PA 15213 acknowledgements

a “k-hypotheses + other”belief updating model

Dan BohusAlex Rudnicky

Computer Science DepartmentCarnegie Mellon UniversityPittsburgh, PA 15213

acknowledgements

Tim Paek Eric Horvitz

Microsoft Research

2/25

motivation

spoken language interfaces are still very brittle

[Parade, Sunday, March 26]

3/25

S: What city are you leaving from ?

U: [CHICAGO]

S: traveling from Chicago. Where would you like to go?

U: [SEOUL]

S: traveling to Seoul…What day did you need to travel?

U: [THE TRAVELING to berlin P_M]

S: traveling in the afternoon… okay, what day would you be departing Chicago?

U: [AT THE TENTH OF AUGUST]

S: a flight tomorrow… I have a Korean Airlines flight departing Chicago at 1:40pm, arrives Seoul at 5pm the next day. Do you want to take that?

U: [flight destination mr WEEKEND]

S: traveling on Saturday, August 12th … I have a flight departing Chicago at 1:40pm arrives Seoul at ……

/ 0.72

/ 0.35

/ 0.58

/ 0.65

/ 0.28

confidence score

misunderstandings

Chicago

Huntsville

no no I’m traveling to Birmingham

the tenth of August

my destination is Birmingham

arrival = {Seoul / 0.65}

4/25

/ 0.72

/ 0.35

/ 0.58

/ 0.65

/ 0.28

confidence score

S: What city are you leaving from ?

U: [CHICAGO]


U: [SEOUL]



S: traveling in the afternoon… okay, what day would you be departing Chicago?

U: [AT THE TENTH OF AUGUST]

S: a flight tomorrow… I have a Korean Airlines flight departing Chicago at 1:40pm, arrives Seoul at 5pm the next day. Do you want to take that?

U: [flight destination mr WEEKEND]

S: traveling on Saturday, August 12th … I have a flight departing Chicago at 1:40pm arrives Seoul at ……

misunderstandings


f

arrival = ?

arrival = { … }

arrival = { … }

arrival = { … }

departure = { … }

departure = { … }

departure = { … }

departure = { … }

departure = { … }

5/25

belief updating: problem statement




f

arrival = ? given an initial belief Binitial(C) over concept C a system action SA(C) a user response R

construct an updated belief Bupdated(C) ← f(Binitial(C), SA(C), R)

6/25

outline

introduction

current solutions

approach

experimental results

effects on global performance

conclusion and future work

intro : current solutions : approach : experimental results : global performance : conclusion

7/25

current solutions


U: [SEOUL]

S: traveling to Seoul… what day did you need to travel?


/ 0.65

/ 0.35

confidence scores / detecting misunderstandings[Cox, Chase, Bansal, Hazen, Ravishankar, Walker, San-Segundo, Bohus]/ 0.72detecting corrections[Litman, Swerts, Hirschberg, Krahmer, Levow]


f

arrival = ?

track single values use simple heuristic belief updating rules

explicit confirmations yes / no

implicit confirmations new values overwrite old values


8/25

outline

introduction

current solutions

approach





9/25

given an initial belief Binitial(C) over concept C a system action SA(C) a user response R

construct an updated belief Bupdated(C) ← f(Binitial(C), SA(C), R)



belief updating: problem statement

/ 0.35


arrival = ?

f


10/25

probability distribution

over

the set of possible values

belief representationBupdated(C) ← f(Binitial(C), SA(C),

R)

however system “hears” only a small number of

conflicting values for a concept throughout a

session max = 3 conflicting values heard

ABERDEEN, TX

ABILENE, T

XALB

ANY, NY

ALBUQUERQUE, N

M

ALLENTO

WN, PA

ALEXANDRIA, L

A

ALLAKAKET,

AK

ALLIANCE, N

EALP

ENA, MI

ALPIN

E, TX

YUMA, AZ

departure


11/25

compressed belief representation

k hypotheses + other

dynamically add and drop hypotheses remember m

hypotheses, add n new ones (m+n=k)

belief representation

departure_city [k=3, m=2, n=1]

AustinBoston Houston other

S: Did you say you were flying from Austin?U: [NO ASPEN]

Aspen

S: flying from Aspen… what is your destination?U: [NO NO I DIDN’T THAT THAT]

ØBoston Aspen other

Boston Austin other

Bupdated(C) ← f(Binitial(C), SA(C), R)

B…(C) is a multinomial variable of degree k+1


12/25

request S: When would you like to take this flight?U:Friday

[FRIDAY] / 0.65

explicit confirmation

S: Did you say you wanted to fly this Friday?U:Yes

[GUEST] / 0.30

implicit confirmation

S: A flight for Friday … at what time?U:At ten a.m.

[AT TEN A_M] / 0.86

no action /unexpected update

S: okay. I will complete the reservation. Please tell

me your name or say ‘guest user’ if you are not

a registered user.U:guest user

[THIS TUESDAY] / 0.55

system actionBupdated(C) ← f(Binitial(C), SA(C),

R)


13/25

acoustic / prosodic

acoustic and language scores, duration, pitch information, voiced-to-unvoiced ratio, speech rate, initial pause

lexical number of words, presence of words highly correlated with corrections or acknowledgements

grammatical

number of slots (new and repeated), goodness-of-parse scores

dialog dialog state, turn number, expectation match, timeout, barge-in, concept identity

priors priors for concept values

confusability

how confusable concept values are

user responseBupdated(C) ← f(Binitial(C), SA(C),

R)


14/25

approach

multinomial regression problem

multinomial generalized linear model sample efficient

stepwise approach feature selection

one separate model for each system action

Bupdated(C) ← fSA(C) (Binitial(C), R)

Bupdated(C) ← f(Binitial(C), SA(C), R)


15/25

outline

introduction

current solutions

approach





16/25

data

RoomLine conference room reservations explicit and implicit confirmations

user study 46 participants 10 scenario-based interactions each

corpus 449 sessions, 8848 user turns transcribed & annotated

misunderstandings, corrections, correct concept values


17/25

model performance

Model (M)[k=2, all features]

initial baseline (i)[error before update]

heuristic baseline (h)[error after heuristic update]

correction baseline (c)[error if we had perfect correction detection]

30.8

16.1

5.0 6.2

30%

20%

10%

0%

i h M c

explicit confirm

c

30.326.0

15.0

21.5

30%

20%

10%

0%

i h M

implicit

confirm

98.2

9.5

5.7

12%

8%

4%

0%

i h M

request 79.7

44.8

14.8

45%

30%

15%

0%

i h M

no action


18/25

outline

introduction

current solutions

approach





19/25

a new user study …

implemented models in the system

2nd, between-subjects experiment

control: using heuristic update rules

treatment: using belief updating models

40 participants, non-native users improvements more likely at high word-error-rates


20/25

effect on task success

logit(TaskSuccess) ← 2.09 - 0.05∙WER + 0.69∙Condition

pro

bab

ility

of

task

su

ccess

16% word error rate

p=0.009

20% 40% 60% 80% 100%0%

word error rate

0%

20%

40%

60%

80%

100%

78%

30% word error rate

78%

64%

treatmentcontrol

logistic ANOVA on task success


21/25

how about efficiency?

ANOVA on task duration for successful tasks

Duration ← -0.21 + 0.013∙WER - 0.106∙Condition

significant improvement equivalent to 7.9% absolute reduction in word-error

p=0.0003


22/25

outline

introduction

current solutions

approach





23/25

U: [CHICAGO]


U: [SEOUL]



S: traveling in the afternoon. Okay what day would you be departing chicago

summary


/ 0.72

/ 0.35

/ 0.65

arrival = ?

f

arrival = { … }departure = { … }

departure = { … }

departure = { … }

approach for constructing accurate beliefs integrate information across multiple turns

significant gains in task success and efficiency


24/25

other advantages

learns from data tuned to the domain in which it operates

sample efficient / scalable local one-turn optimization, concepts are

independent

RoomLine operates with 29 concepts cardinality: 2 several hundreds

portable decoupled from dialog task specification

no assumptions about dialog management


25/25

future work

integrate information from n-best list

integrate other high-level knowledge

domain-specific constraints

inter-concept dependencies

investigate technique in other domains


26/25

thank you! questions …

27/25

0 10 20 30 40 50 60 70 80 90 1000.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0.18

improvements at different WER

word-error-rate

absolute improvement in task success

28/25

user study

10 scenarios, fixed order presented graphically (explained during briefing)

participants compensated per task success

29/25

informative features

priors and confusability initial confidence scores concept identity barge-in expectation match repeated grammar slots

30/25

Models (k=2, runtime features)

# The model for the explicit confirm action new_1 otherLR_MODEL(EC)

k = -15.96 3.61answer_type[YES] = -12.67 -5.90answer_type[NO] = 4.55 3.15answer_type[OTHER] = 1.20 -0.75concept_id(equip) = 6.96 4.42i_th_confusability = -3.67 -4.80ih_diff_lexical_one_word = -15.99 -1.17lexw1[SMALL] = 17.63 20.26response_new_hyps_in_selh = 18.85 0.41

END

31/25


# The model for the implicit confirm action new_1 otherLR_MODEL(IC)

mark_confirm = 0.31 -1.74mark_disconfirm = 3.39 1.57i_th_conf = 0.39 -3.63i_th_confusability = -4.17 -4.54k = -16.83 3.75lex[THREE] = -2.25 -2.68response_new_hyps_in_selh = 20.88 1.70turn_number = 0.01 0.03

END

32/25


# The model for the request action new_1 otherLR_MODEL(REQ)

k = -0.78 3.56 barge_in = -2.07 -1.40 concept_id(date)= 11.29 9.80 concept_id(user_name) = 1.93 -13.91

dialog_state[RequestSpecificTimes] = 13.29 14.26 ih_diff_lexical = -1.54 0.17 initial_num_hyps_>_0 = -21.70 -2.71

total_num_parses = -1.06 -0.40ur_selh_new_1_conf = 4.09 1.76ur_selh_new_1_confusability = 5.81 1.70 ur_selh_new_1_prior = 0.67 0.98ur_selh_new_1_prior_>_1 = -1.00 -6.38

END

Documents

A “k-hypotheses + other” belief updating model Dan Bohus Alex Rudnicky Computer Science Department Carnegie Mellon University Pittsburgh, PA 15213 acknowledgements