34
online supervised learning of non-understanding recovery policies Dan Bohus www.cs.cmu.edu/~dbohus [email protected] Computer Science Department Carnegie Mellon University Pittsburgh, PA 15213 with thanks to: Alex Rudnicky Brian Langner Antoine Raux Alan Black Maxine Eskenazi

online supervised learning of non-understanding recovery policies

  • Upload
    liona

  • View
    23

  • Download
    0

Embed Size (px)

DESCRIPTION

online supervised learning of non-understanding recovery policies. Dan Bohus www.cs.cmu.edu/~dbohus [email protected] Computer Science Department Carnegie Mellon University Pittsburgh, PA 15213. with thanks to: Alex Rudnicky Brian Langner Antoine Raux Alan Black Maxine Eskenazi. ?. - PowerPoint PPT Presentation

Citation preview

Page 1: online supervised learning of  non-understanding recovery policies

online supervised learning of non-understanding recovery policies

Dan Bohuswww.cs.cmu.edu/[email protected]

Computer Science DepartmentCarnegie Mellon UniversityPittsburgh, PA 15213

with thanks to:

Alex RudnickyBrian LangnerAntoine Raux

Alan BlackMaxine Eskenazi

Page 2: online supervised learning of  non-understanding recovery policies

2

•Sorry, I didn’t catch that …•Can you repeat that?•Can you rephrase that?•Where are you flying from?•Please tell me the name of the city you are leaving from …•Could you please go to a quieter

place?•Sorry, I didn’t catch that … tell me the state first …

S:

understanding-errors in spoken dialog

S: Where are you flying from?U: Birmingham [BERLIN PM]

System constructs an incorrect semantic representation of the user’s turn

MIS-understanding

S: Where are you flying from?U: Urbana Champaign [OKAY IN THAT SAME PAY]

System fails to construct a semantic representation of the user’s turn

NON-understanding

•Did you say Berlin?•from Berlin … where to?

S:

???

Page 3: online supervised learning of  non-understanding recovery policies

3

recovery strategies

large set of strategies (“strategy” = 1-step action)

tradeoffs not well understood some strategies are more appropriate at

certain times OOV -> ask repeat is not a good idea door slam -> ask repeat might work well

•Sorry, I didn’t catch that …•Can you repeat that?•Can you rephrase that?•Where are you flying from?•Please tell me the name of the city you are leaving from …•Could you please go to a quieter place?•Sorry, I didn’t catch that … tell me the state first …

S:

Page 4: online supervised learning of  non-understanding recovery policies

4

recovery policy

“policy” = method for choosing between strategies

difficult to handcraft especially over a large set of recovery strategies

common approaches heuristic “three strikes and you’re out” [Balentine]

1st non-understanding: ask user to repeat 2nd non-understanding: provide more help, including

examples 3rd non-understanding: transfer to an operator

Page 5: online supervised learning of  non-understanding recovery policies

5

this talk …

… an online, supervised method for learning a non-understanding recovery policy from data

Page 6: online supervised learning of  non-understanding recovery policies

6

overview

introduction

approach

experimental setup

results

discussion

Page 7: online supervised learning of  non-understanding recovery policies

7

overview

introduction

approach

experimental setup

results

discussion

Page 8: online supervised learning of  non-understanding recovery policies

8

intuition …

… if we knew the probability of success for each strategy in the current situation, we could easily construct a policy

S: Where are you flying from?U: [OKAY IN THAT SAME PAY] Urbana Champaign

•Sorry, I didn’t catch that …•Can you repeat that?•Can you rephrase that?•Where are you flying from?•Please tell me the name of the city you are leaving from …•Could you please go to a quieter place?•Sorry, I didn’t catch that … tell me the state first …

S: 32%15%20%30%45%25%43%

Page 9: online supervised learning of  non-understanding recovery policies

9

two step approach

step 1: learn to estimate probability of success for each strategy, in a given situation

step 2: use these estimates to choose between

strategies (and hence build a policy)

Page 10: online supervised learning of  non-understanding recovery policies

10

learning predictors for strategy success

supervised learning: logistic regression target: strategy recovery successfully or not

“success” = next turn is correctly understood labeled semi-automatically

features: describe current situation extracted from different knowledge sources

recognition features language understanding features dialog-level features [state, history]

Page 11: online supervised learning of  non-understanding recovery policies

11

logistic regression

well-calibrated class-posterior probabilities predictions reflect empirical probability of success

x% of cases where P(S|F)=x are indeed successful

sample efficient one model per strategy, so data will be sparse

stepwise construction automatic feature selection

provide confidence bounds very useful for online learning

Page 12: online supervised learning of  non-understanding recovery policies

12

two step approach

step 1: learn to estimate probability of success for each strategy, in a given situation

step 2: use these estimates to choose between

strategies (and hence build a policy)

Page 13: online supervised learning of  non-understanding recovery policies

13

policy learning

choose strategy most likely to succeed

BUT: we want to learn online we have to deal with the exploration /

exploitation tradeoff

S1 S2 S3 S4 0

1

Page 14: online supervised learning of  non-understanding recovery policies

14

highest-upper-bound learning choose strategy with highest-upper-bound

proposed by [Kaelbling 93] empirically shown to do well in various problems

intuition

S1 S2 S3 S4 0

1

S1 S2 S3 S4 0

1

exploitation exploration

Page 15: online supervised learning of  non-understanding recovery policies

15

highest-upper-bound learning choose strategy with highest upper

bound proposed by [Kaelbling 93] empirically shown to do well in various

problems

intuition

S1 S2 S3 S4 0

1

S1 S2 S3 S4 0

1

exploitation exploration

Page 16: online supervised learning of  non-understanding recovery policies

16

highest-upper-bound learning choose strategy with highest upper

bound proposed by [Kaelbling 93] empirically shown to do well in various

problems

intuition

S1 S2 S3 S4 0

1

S1 S2 S3 S4 0

1

exploitation exploration

Page 17: online supervised learning of  non-understanding recovery policies

17

highest-upper-bound learning choose strategy with highest upper

bound proposed by [Kaelbling 93] empirically shown to do well in various

problems

intuition

S1 S2 S3 S4 0

1

S1 S2 S3 S4 0

1

exploitation exploration

Page 18: online supervised learning of  non-understanding recovery policies

18

highest-upper-bound learning choose strategy with highest upper

bound proposed by [Kaelbling 93] empirically shown to do well in various

problems

intuition

S1 S2 S3 S4 0

1

S1 S2 S3 S4 0

1

exploitation exploration

Page 19: online supervised learning of  non-understanding recovery policies

19

overview

introduction

approach

experimental setup

results

discussion

Page 20: online supervised learning of  non-understanding recovery policies

20

system

Let’s Go! Public bus information system

connected to PAT customer service line during non-business hours

~30-50 calls / night

Page 21: online supervised learning of  non-understanding recovery policies

21

strategiesName Example

HLP For instance, you can say ‘FORBES AND MURRAY’, or ‘DOWNTOWN’

HLP_RFor instance, you can say ‘FORBES AND MURRAY’, or ‘DOWNTOWN’, or say ‘START OVER’ to restart

RP Where are you leaving from? [repeats previous system prompt]

AREP Can you repeat what you just said?

ARPH Could you rephrase that?

MOVETell me first your departure neighborhood … [ignore the current non-understanding and back-off to an alternative dialog plan]

ASAPlease use shorter answers because I have trouble understanding long sentences …

SLL Sorry, I understand people best when they speak softer …

IT Give general interaction tips to the user

ASOI’m sorry but I’m still having trouble understanding you and I might do better if we restarted. Would you like to start over?

GUPI’m sorry, but it doesn’t seem like I’m able to help you. Please call back during regular business hours …

Page 22: online supervised learning of  non-understanding recovery policies

22

constraints

constraints don’t AREP more than twice in a row don’t ARPH if #words <= 3 don’t ASA unless #words > 5 don’t ASO unless (4 nonu in a row) and (ratio.nonu >

50%) don’t GUP unless (dialog > 30 turns) and (ratio.nonu >

80%)

capture expert knowledge; ensure system doesn’t use an unreasonable policy

4.2/11 strategies available on average min=1, max=9

Page 23: online supervised learning of  non-understanding recovery policies

23

features

current non-understanding recognition, lexical, grammar, timing info

current non-understanding segment length, which strategies already taken

current dialog state and history encoded dialog states

“how good things have been going”

Page 24: online supervised learning of  non-understanding recovery policies

24

learning

baseline period [2 weeks, 3/11 -> 3/25, 2006] system randomly chose a strategy, while obeying

constraints

in effect, a heuristic / stochastic policy

learning period [5 weeks, 3/26 -> 5/5, 2006] each morning labeled data from previous night

retrained likelihood of success predictors

installed in the system for the next night

Page 25: online supervised learning of  non-understanding recovery policies

25

2 strategies eliminatedName Example

HLP For instance, you can say ‘FORBES AND MURRAY’, or ‘DOWNTOWN’

HLP_RFor instance, you can say ‘FORBES AND MURRAY’, or ‘DOWNTOWN’, or say ‘START OVER’ to restart

RP Where are you leaving from? [repeats previous system prompt]

AREP Can you repeat what you just said?

ARPH Could you rephrase that?

MOVETell me first your departure neighborhood … [ignore the current non-understanding and back-off to an alternative dialog plan]

ASAPlease use shorter answers because I have trouble understanding long sentences …

SLL Sorry, I understand people best when they speak softer …

IT Give general interaction tips to the user

ASOI’m sorry but I’m still having trouble understanding you and I might do better if we restarted. Would you like to start over?

GUPI’m sorry, but it doesn’t seem like I’m able to help you. Please call back during regular business hours …

Page 26: online supervised learning of  non-understanding recovery policies

26

overview

introduction

approach

experimental setup

results

discussion

Page 27: online supervised learning of  non-understanding recovery policies

27

results average non-understanding recovery rate

(ANNR) improvement: 33.6% 37.8% (p=0.03)

(12.5%rel)

fitted learning curve:

3/11 3/18 3/25 4/1 4/8 4/15 4/22 4/29 5/60%

10%

20%

30%

40%

50%

60%

DnC

DnC

e

eBAANRR

1

A = 0.3385B = 0.0470C = 0.5566D = -11.44

Page 28: online supervised learning of  non-understanding recovery policies

28

policy evolution MOVE, HLP, ASA engaged more often AREP, ARPH engaged less often

3/11 3/18 3/25 4/1 4/8 4/15 4/22 4/29 5/60%

20%

40%

60%

80%

100%

MOVE

ASA

IT

SLL

ARPH

AREP

HLP

RP

HLP_R

Page 29: online supervised learning of  non-understanding recovery policies

29

overview

introduction

approach

experimental setup

results

discussion

Page 30: online supervised learning of  non-understanding recovery policies

30

are the predictors learning anything?

AREP(653), IT(273), SLL(300) no informative features

ARPH(674), MOVE(1514) 1 informative feature (#prev.nonu, #words)

ASA(637), RP(2532), HLP(3698), HLP_R(989) 4 or more informative features in the model

dialog state (especially explicit confirm states) dialog history

Page 31: online supervised learning of  non-understanding recovery policies

31

more features, more (specific) strategies

more features would be useful day-of-week clustered dialog states ? (any ideas?) ?

more strategies / variants approach might be able to filter out bad

versions more specific strategies, features

ask short answers worked well … speak less loud didn’t … (why?)

Page 32: online supervised learning of  non-understanding recovery policies

32

“noise” in the experiment

~15-20% of responses following non-understandings are non-user-responses transient noises secondary speech primary speech not directed to the system

this might affect training, in a future experiment we want to eliminate that

Page 33: online supervised learning of  non-understanding recovery policies

33

unsupervised learning

supervised version “success” = next turn is correctly understood

[i.e. no misunderstanding, no non-understanding]

unsupervised version “success” = next turn is not a non-

understanding “success” = confidence score of next turn training labels automatically available performance improvements might still be

possible

Page 34: online supervised learning of  non-understanding recovery policies

34

thank you!