A Question of Questions: Prosodic Cues to Question Form and Function Julia Hirschberg (Joint work...

Preview:

Citation preview

A Question of Questions: Prosodic Cues to Question Form and Function

Julia Hirschberg

(Joint work with)

Jennifer Venditti and Jackson Liscombe

Questioning in Dialogue

• A fundamental activity in conversation• Elicit information• Elicit action

• But• How to define a question?

• Bolinger ’57: “fundamentally an attitude…an utterance that ‘craves’ a verbal or other semiotic … response”

• Ginzburg & Sag ‘00: “the semantic object associated with the attitude of wondering and the speech act of questioning”

• How to identify a question as such• How to represent its semantics? The intention of the

questioner?

Distinguishing Question Form and Function

• Questions may take many syntactic forms• Is it a question? What is a question? It’s a question, isn’t it?

Is it a question or an answer? Right? It’s a question?

• Questions may serve many pragmatic functions• Clarification-seeking? Information-seeking? Confirmation-

seeking?

• Possible Indicators• Syntactic cues• Context• Intonation

Questions in Spoken Dialogue Systems

• Goals• Examine question form and function

• How are they related?• What features characterize them?

• Identify form and function automatically in an Intelligent Tutoring domain

Previous Studies

• Integration of prosodic tree model with language model based on words yields best performance accuracy in detecting questions/question form (Shriberg et al.’98: English)

• Some corpus-based (MapTask) studies have examined tune/accent types wrt. question function (Kowtko’96: Glaswegian English; Grice et al.’95: German, Italian, Bulgarian)

• Studies of different types (functions) of clarification questions (Rodríguez & Schlangen’94: German; Edlund et al.’95: Swedish)

• Our goal: a comprehensive quantitative analysis of question form and function in English which will permit question form/function identification

Domain: Intelligent Tutoring Systems

• ITSs must be able to recognize both the form and function of student questions• Students ask human tutors many questions• More questions better learning

• Different question FORMs seek different information• e.g. polar questions seek yes-no answer• wh-questions seek different information

• Different question FUNCTIONs also often require different types of answers

• Wh-questions, e.g.• Information-seeking:

(S has just submitted an essay to the tutor)

S: Ok, what do you think about that?

T: Uh, well that uh you have uh there are too many parameters here which uh need definition ...

• Clarification-seeking:• T: So if there is if the only force on an object in

earth’s gravity then what is its motion called? • S: What was the motion called? • T: Yes, what’s the name for this motion?

• Yes-no questions, e.g.• Information-seeking tutor provides additional

information • Clarification clarification subdialogue

• Successful ITSs must be able to recognize the presence of a question in a student turn and its form and function

Question Corpus

• Human-human tutoring dialogs collected by Litman et al.’04 for development of ITSpoke, a speech-enabled ITS designed to teach physics• Why2-Atlas (Kurt VanLehn (U. Pitt), Art Graesser (U.

Memphis))

• Corpus includes 1030 student questions • ‘Question’ defined a la Bolinger ‘57 as “an utterance

that craves a response”• 25.2 Qs/hour• 13.3% of total student speaking time

• This study: a subset of 643 tokens

[pr01_sess00_prob58]

Question Detection

what symbol are you talking about

do i have to rewrite this again

am i ok with that

so it’d be one meter per second squared

Coding question type

• Form coding based on surface syntax• Declarative question (dQ): It’s a vector? A vector? • Yes-no question (ynQ): Is it a vector? • Wh-question (whQ): What is a vector? • Tag question (ynTAG): It’s a vector, isn’t it? • Alternative question (altQ): Is it a vector or a scalar? • Particle (part): Huh?

• Function coding derived from Stenström ‘84• Confirmation-seeking check question (chk)• Clarification-seeking question (clar)• Information-seeking question (info)• Other (oth)

Form/Function Distribution

chk clar info oth N (%)

dQ 257 81 2 4 344 (53.5)

ynQ 53 80 27 5 165 (25.7)

whQ - 47 21 - 68 (10.6)

ynTAG 41 5 - - 46 (7.2)

altQ 6 5 1 - 12 (1.9)

part - 8 - - 8 (1.2)

N 357 226 51 9 643

(%) (55.5) (35.1) (7.9) (1.4) (100)

Falling (L-L%) F0 contours

chk clar info oth N (%)

dQ 3 4 - - 7 (2.0)

ynQ - 4 5 2 11 (6.7)

whQ - 12 17 - 29 (42.6)

ynTAG 1 1 - - 2 (4.3)

altQ 2 5 1 - 8 (66.7)

part - - - - -

N 6 26 23 2 57

(%) (1.7) (11.5) (45.1) (22.2) (100)

F0 measures of non-falling questions

• Quantitative analysis of F0 height in the 573 non-falling tokens w/sufficient data for analysis

• Examined question nucleus (nucF0) and tail (btF0) only

• Speaker-normalized (z-score) F0 of:• 1. nuclear accent (nucF0)• 2. rightmost edge of question (btF0)• 3. difference between 1 & 2 (riserange)

Question Form and F0

• DeclQs and YNQs both thought to rise (H*H-H% vs. L*H-H%?): Are there F0 height differences between them?

• 2-way ANOVA on form x function:FORM: nucF0: F(5)=19.34, p=0

btF0: F(5)10.71, p=0

riserange: F(5)=3.6, p<.01• Planned comparisons (Tukey, alpha=.01) show no

difference between declarative Qs and yes-no Qs• Main effect of form caused by yes-no tags (low

F0) and particles (high F0)

Normalized means at nucF0 and btF0

chk clar info chk clar info

Question Function and F0

• Question dialog acts thought to correlate with F0: Does question FUNCTION affect F0?

• 2-way ANOVA on form x function:FUNCTION: nucF0: F(3)=16.6, p=0

btF0: F(3)=8.56, p<.001

riserange: F(3)=3.94, p<.01

• Main effect; planned comparisons show:• clarQ > chkQ (nucF0 & btF0)• infoQ > clarQ/chkQ (nucF0)• No interactions for any measure

Clarification types and F0

1 Channel: Problem hearing if the tutor actually said something or not (Huh?, Hm?)

2 Perception: Problem hearing what the tutor said (‘G’ as in God?, Did you say a word or a letter?, including reprise/echo questions (A what?)

3 Understanding: Problem with reference resolution (This up here?, What did I imply or what does the statement imply?), or with general understanding (Is that the same thing or is that different?, What do you mean?)

4 Intention: Problem determining what the tutor intended by his utterance (You want an exact number?, Uh are you asking me another characteristic of freefall?)

+ Non-interlocutor-related (NIR): Problem understanding the task (Am I supposed to speak this or type it?), or clarification of the examination question (Should I assume both vehicles are going at the same speed?)

Clark ‘96 levels of coordination: sources of communication problems

Effects of Clarification Type

• One-way ANOVA combining levels 1&2 into single acoustic/perceptual category:

nucF0: F(3)=5.41, p=.001btF0: F(3)=6.6, p<.001riserange: F(3)=2.59, p=.05

• Main effect for clarification type• Ranking for each measure:

higher F0 > > > > > > > > > > > > > > > lower F0acoust/percept > understanding > NIR > intention• Planned comparisons (Tukey, alpha=.01)

show only significant comparison was acoust/percep > intention

Can Prosody Distinguish Question Form? Question Function?

• Only a few question forms prosodically distinct in our study – lexico/syntactic information can help

• Question function more successfully differentiated prosodically – where there is less reliable lexico/syntactic information

• Can we use prosodic information with lexico-syntactic information to help identify question form and function automatically?

Detecting Student Questions

• Syntax• Wh-words, subject/auxiliary inversion

• Prosody• Phrase-final rising intonation (Pierrehumbert &

Hirschberg ‘90)• Duration and pausing (Shriberg et al. ‘98)

• Lexico-pragmatics• personal pronouns, utterance-initial pronouns

(Geluykens 1987; Beun 1990)

Corpus

• 141 ITSpoke dialogues• 5 hours of student speech• Student turns average 2.5 seconds• 1,030 questions• 25 questions per hour• 70% of turns consist entirely of the question• 89% of questions are turn-final

Question Form Distribution in ITSpoke

Form Example Distr.

yes/no Is that right? 24%

wh- What do you mean? 10%

yes/no tag It will stay the same, right? 7%

alternative Force or something? 3%

particle Huh? 2%

declarative The weight? 54%

Question-Bearing Turns

• Contain one or more questions

• N = 918

Features Extracted

• Prosodic• pitch• loudness• pausing• speaking rate• calculated over entire turn and last 200 ms

• Syntactic• unigram and bigram part-of-speech tags

Feature Extraction

• Lexical• unigram and bigram hand-labeled transcriptions

• Student and task dependent• pre-test score• gender• correctness• previous tutor dialogue act

Machine Learning Experiments

• Question-bearing vs. non-question-bearing• Down-sampled to 50/50 distribution• Experimented by feature type• Adaboosted C4.5 decision trees

• 5-fold cross validation

• Best results with all features• Accuracy = 79.7%• Precision = Recall = F-measure = 0.8

Accuracy by Feature Type

prosody: pausing and speaking rate 52.6%

student and task dependent 56.1%

prosody: loudness 61.8%

syntactic 65.3%

lexical 67.2%

prosody: last 200 ms 70.3%

prosody: pitch 72.6%

prosody: all 74.5%

Feature Type Discussion

• Which features most informative?• pitch slope of last 200 ms and entire turn• maximum and mean pitch of turn

• Which features most often used in learning?• pre-test score• slope of last 200 ms• maximum pitch of entire turn• cumulative pause duration

Other Observations

• Syntactic features were informative• personal pronoun + verb, wh-pronoun, interjection

• Lexical features were informative• yes, right, what, I, you

Conclusions

• Most questions in our tutoring corpus are declarative in form• More than syntax is needed to identify these as

questions• Prosodic features are very important

• Detecting question-bearing turns is possible• Detecting question function is needed

Question Forms in ITSpoke

Form Distr. Example

declarative 54% The weight?

yes/no 24% Is that right?

wh- 10% What do you mean?

yes/no tag 7% It will stay the same, right?

alternative 3% Force or something?

particle 2% Huh?

Recommended