View
222
Download
6
Tags:
Embed Size (px)
Citation preview
a “k-hypotheses + other”belief updating model
Dan BohusAlex Rudnicky
Computer Science DepartmentCarnegie Mellon UniversityPittsburgh, PA 15213
acknowledgements
Tim Paek Eric Horvitz
Microsoft Research
2/25
motivation
spoken language interfaces are still very brittle
[Parade, Sunday, March 26]
3/25
S: What city are you leaving from ?
U: [CHICAGO]
S: traveling from Chicago. Where would you like to go?
U: [SEOUL]
S: traveling to Seoul…What day did you need to travel?
U: [THE TRAVELING to berlin P_M]
S: traveling in the afternoon… okay, what day would you be departing Chicago?
U: [AT THE TENTH OF AUGUST]
S: a flight tomorrow… I have a Korean Airlines flight departing Chicago at 1:40pm, arrives Seoul at 5pm the next day. Do you want to take that?
U: [flight destination mr WEEKEND]
S: traveling on Saturday, August 12th … I have a flight departing Chicago at 1:40pm arrives Seoul at ……
/ 0.72
/ 0.35
/ 0.58
/ 0.65
/ 0.28
confidence score
misunderstandings
Chicago
Huntsville
no no I’m traveling to Birmingham
the tenth of August
my destination is Birmingham
arrival = {Seoul / 0.65}
4/25
/ 0.72
/ 0.35
/ 0.58
/ 0.65
/ 0.28
confidence score
S: What city are you leaving from ?
U: [CHICAGO]
S: traveling from Chicago. Where would you like to go?
U: [SEOUL]
S: traveling to Seoul…What day did you need to travel?
U: [THE TRAVELING to berlin P_M]
S: traveling in the afternoon… okay, what day would you be departing Chicago?
U: [AT THE TENTH OF AUGUST]
S: a flight tomorrow… I have a Korean Airlines flight departing Chicago at 1:40pm, arrives Seoul at 5pm the next day. Do you want to take that?
U: [flight destination mr WEEKEND]
S: traveling on Saturday, August 12th … I have a flight departing Chicago at 1:40pm arrives Seoul at ……
misunderstandings
arrival = {Seoul / 0.65}
f
arrival = ?
arrival = { … }
arrival = { … }
arrival = { … }
departure = { … }
departure = { … }
departure = { … }
departure = { … }
departure = { … }
5/25
belief updating: problem statement
S: traveling to Seoul…What day did you need to travel?
U: [THE TRAVELING to berlin P_M]
arrival = {Seoul / 0.65}
f
arrival = ? given an initial belief Binitial(C) over concept C a system action SA(C) a user response R
construct an updated belief Bupdated(C) ← f(Binitial(C), SA(C), R)
6/25
outline
introduction
current solutions
approach
experimental results
effects on global performance
conclusion and future work
intro : current solutions : approach : experimental results : global performance : conclusion
7/25
current solutions
S: traveling from Chicago. Where would you like to go?
U: [SEOUL]
S: traveling to Seoul… what day did you need to travel?
U: [THE TRAVELING to berlin P_M]
/ 0.65
/ 0.35
confidence scores / detecting misunderstandings[Cox, Chase, Bansal, Hazen, Ravishankar, Walker, San-Segundo, Bohus]/ 0.72detecting corrections[Litman, Swerts, Hirschberg, Krahmer, Levow]
arrival = {Seoul / 0.65}
f
arrival = ?
track single values use simple heuristic belief updating rules
explicit confirmations yes / no
implicit confirmations new values overwrite old values
intro : current solutions : approach : experimental results : global performance : conclusion
8/25
outline
introduction
current solutions
approach
experimental results
effects on global performance
conclusion and future work
intro : current solutions : approach : experimental results : global performance : conclusion
9/25
given an initial belief Binitial(C) over concept C a system action SA(C) a user response R
construct an updated belief Bupdated(C) ← f(Binitial(C), SA(C), R)
S: traveling to Seoul…What day did you need to travel?
U: [THE TRAVELING to berlin P_M]
belief updating: problem statement
/ 0.35
arrival = {Seoul / 0.65}
arrival = ?
f
intro : current solutions : approach : experimental results : global performance : conclusion
10/25
probability distribution
over
the set of possible values
belief representationBupdated(C) ← f(Binitial(C), SA(C),
R)
however system “hears” only a small number of
conflicting values for a concept throughout a
session max = 3 conflicting values heard
ABERDEEN, TX
ABILENE, T
XALB
ANY, NY
ALBUQUERQUE, N
M
ALLENTO
WN, PA
ALEXANDRIA, L
A
ALLAKAKET,
AK
ALLIANCE, N
EALP
ENA, MI
ALPIN
E, TX
YUMA, AZ
departure
intro : current solutions : approach : experimental results : global performance : conclusion
11/25
compressed belief representation
k hypotheses + other
dynamically add and drop hypotheses remember m
hypotheses, add n new ones (m+n=k)
belief representation
departure_city [k=3, m=2, n=1]
AustinBoston Houston other
S: Did you say you were flying from Austin?U: [NO ASPEN]
Aspen
S: flying from Aspen… what is your destination?U: [NO NO I DIDN’T THAT THAT]
ØBoston Aspen other
Boston Austin other
Bupdated(C) ← f(Binitial(C), SA(C), R)
B…(C) is a multinomial variable of degree k+1
intro : current solutions : approach : experimental results : global performance : conclusion
12/25
request S: When would you like to take this flight?U:Friday
[FRIDAY] / 0.65
explicit confirmation
S: Did you say you wanted to fly this Friday?U:Yes
[GUEST] / 0.30
implicit confirmation
S: A flight for Friday … at what time?U:At ten a.m.
[AT TEN A_M] / 0.86
no action /unexpected update
S: okay. I will complete the reservation. Please tell
me your name or say ‘guest user’ if you are not
a registered user.U:guest user
[THIS TUESDAY] / 0.55
system actionBupdated(C) ← f(Binitial(C), SA(C),
R)
intro : current solutions : approach : experimental results : global performance : conclusion
13/25
acoustic / prosodic
acoustic and language scores, duration, pitch information, voiced-to-unvoiced ratio, speech rate, initial pause
lexical number of words, presence of words highly correlated with corrections or acknowledgements
grammatical
number of slots (new and repeated), goodness-of-parse scores
dialog dialog state, turn number, expectation match, timeout, barge-in, concept identity
priors priors for concept values
confusability
how confusable concept values are
user responseBupdated(C) ← f(Binitial(C), SA(C),
R)
intro : current solutions : approach : experimental results : global performance : conclusion
14/25
approach
multinomial regression problem
multinomial generalized linear model sample efficient
stepwise approach feature selection
one separate model for each system action
Bupdated(C) ← fSA(C) (Binitial(C), R)
Bupdated(C) ← f(Binitial(C), SA(C), R)
intro : current solutions : approach : experimental results : global performance : conclusion
15/25
outline
introduction
current solutions
approach
experimental results
effects on global performance
conclusion and future work
intro : current solutions : approach : experimental results : global performance : conclusion
16/25
data
RoomLine conference room reservations explicit and implicit confirmations
user study 46 participants 10 scenario-based interactions each
corpus 449 sessions, 8848 user turns transcribed & annotated
misunderstandings, corrections, correct concept values
intro : current solutions : approach : experimental results : global performance : conclusion
17/25
model performance
Model (M)[k=2, all features]
initial baseline (i)[error before update]
heuristic baseline (h)[error after heuristic update]
correction baseline (c)[error if we had perfect correction detection]
30.8
16.1
5.0 6.2
30%
20%
10%
0%
i h M c
explicit confirm
c
30.326.0
15.0
21.5
30%
20%
10%
0%
i h M
implicit
confirm
98.2
9.5
5.7
12%
8%
4%
0%
i h M
request 79.7
44.8
14.8
45%
30%
15%
0%
i h M
no action
intro : current solutions : approach : experimental results : global performance : conclusion
18/25
outline
introduction
current solutions
approach
experimental results
effects on global performance
conclusion and future work
intro : current solutions : approach : experimental results : global performance : conclusion
19/25
a new user study …
implemented models in the system
2nd, between-subjects experiment
control: using heuristic update rules
treatment: using belief updating models
40 participants, non-native users improvements more likely at high word-error-rates
intro : current solutions : approach : experimental results : global performance : conclusion
20/25
effect on task success
logit(TaskSuccess) ← 2.09 - 0.05∙WER + 0.69∙Condition
pro
bab
ility
of
task
su
ccess
16% word error rate
p=0.009
20% 40% 60% 80% 100%0%
word error rate
0%
20%
40%
60%
80%
100%
78%
30% word error rate
78%
64%
treatmentcontrol
logistic ANOVA on task success
intro : current solutions : approach : experimental results : global performance : conclusion
21/25
how about efficiency?
ANOVA on task duration for successful tasks
Duration ← -0.21 + 0.013∙WER - 0.106∙Condition
significant improvement equivalent to 7.9% absolute reduction in word-error
p=0.0003
intro : current solutions : approach : experimental results : global performance : conclusion
22/25
outline
introduction
current solutions
approach
experimental results
effects on global performance
conclusion and future work
intro : current solutions : approach : experimental results : global performance : conclusion
23/25
U: [CHICAGO]
S: traveling from Chicago. Where would you like to go?
U: [SEOUL]
S: traveling to Seoul…What day did you need to travel?
U: [THE TRAVELING to berlin P_M]
S: traveling in the afternoon. Okay what day would you be departing chicago
summary
arrival = {Seoul / 0.65}
/ 0.72
/ 0.35
/ 0.65
arrival = ?
f
arrival = { … }departure = { … }
departure = { … }
departure = { … }
approach for constructing accurate beliefs integrate information across multiple turns
significant gains in task success and efficiency
intro : current solutions : approach : experimental results : global performance : conclusion
24/25
other advantages
learns from data tuned to the domain in which it operates
sample efficient / scalable local one-turn optimization, concepts are
independent
RoomLine operates with 29 concepts cardinality: 2 several hundreds
portable decoupled from dialog task specification
no assumptions about dialog management
intro : current solutions : approach : experimental results : global performance : conclusion
25/25
future work
integrate information from n-best list
integrate other high-level knowledge
domain-specific constraints
inter-concept dependencies
investigate technique in other domains
intro : current solutions : approach : experimental results : global performance : conclusion
26/25
thank you! questions …
27/25
0 10 20 30 40 50 60 70 80 90 1000.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
0.18
improvements at different WER
word-error-rate
absolute improvement in task success
28/25
user study
10 scenarios, fixed order presented graphically (explained during briefing)
participants compensated per task success
29/25
informative features
priors and confusability initial confidence scores concept identity barge-in expectation match repeated grammar slots
30/25
Models (k=2, runtime features)
# The model for the explicit confirm action new_1 otherLR_MODEL(EC)
k = -15.96 3.61answer_type[YES] = -12.67 -5.90answer_type[NO] = 4.55 3.15answer_type[OTHER] = 1.20 -0.75concept_id(equip) = 6.96 4.42i_th_confusability = -3.67 -4.80ih_diff_lexical_one_word = -15.99 -1.17lexw1[SMALL] = 17.63 20.26response_new_hyps_in_selh = 18.85 0.41
END
31/25
Models (k=2, runtime features)
# The model for the implicit confirm action new_1 otherLR_MODEL(IC)
mark_confirm = 0.31 -1.74mark_disconfirm = 3.39 1.57i_th_conf = 0.39 -3.63i_th_confusability = -4.17 -4.54k = -16.83 3.75lex[THREE] = -2.25 -2.68response_new_hyps_in_selh = 20.88 1.70turn_number = 0.01 0.03
END
32/25
Models (k=2, runtime features)
# The model for the request action new_1 otherLR_MODEL(REQ)
k = -0.78 3.56 barge_in = -2.07 -1.40 concept_id(date)= 11.29 9.80 concept_id(user_name) = 1.93 -13.91
dialog_state[RequestSpecificTimes] = 13.29 14.26 ih_diff_lexical = -1.54 0.17 initial_num_hyps_>_0 = -21.70 -2.71
total_num_parses = -1.06 -0.40ur_selh_new_1_conf = 4.09 1.76ur_selh_new_1_confusability = 5.81 1.70 ur_selh_new_1_prior = 0.67 0.98ur_selh_new_1_prior_>_1 = -1.00 -6.38
END