View
219
Download
0
Category
Tags:
Preview:
Citation preview
A Question of Questions: Prosodic Cues to Question Form and Function
Julia Hirschberg
(Joint work with)
Jennifer Venditti and Jackson Liscombe
Questioning in Dialogue
• A fundamental activity in conversation• Elicit information• Elicit action
• But• How to define a question?
• Bolinger ’57: “fundamentally an attitude…an utterance that ‘craves’ a verbal or other semiotic … response”
• Ginzburg & Sag ‘00: “the semantic object associated with the attitude of wondering and the speech act of questioning”
• How to identify a question as such• How to represent its semantics? The intention of the
questioner?
Distinguishing Question Form and Function
• Questions may take many syntactic forms• Is it a question? What is a question? It’s a question, isn’t it?
Is it a question or an answer? Right? It’s a question?
• Questions may serve many pragmatic functions• Clarification-seeking? Information-seeking? Confirmation-
seeking?
• Possible Indicators• Syntactic cues• Context• Intonation
Questions in Spoken Dialogue Systems
• Goals• Examine question form and function
• How are they related?• What features characterize them?
• Identify form and function automatically in an Intelligent Tutoring domain
Previous Studies
• Integration of prosodic tree model with language model based on words yields best performance accuracy in detecting questions/question form (Shriberg et al.’98: English)
• Some corpus-based (MapTask) studies have examined tune/accent types wrt. question function (Kowtko’96: Glaswegian English; Grice et al.’95: German, Italian, Bulgarian)
• Studies of different types (functions) of clarification questions (Rodríguez & Schlangen’94: German; Edlund et al.’95: Swedish)
• Our goal: a comprehensive quantitative analysis of question form and function in English which will permit question form/function identification
Domain: Intelligent Tutoring Systems
• ITSs must be able to recognize both the form and function of student questions• Students ask human tutors many questions• More questions better learning
• Different question FORMs seek different information• e.g. polar questions seek yes-no answer• wh-questions seek different information
• Different question FUNCTIONs also often require different types of answers
• Wh-questions, e.g.• Information-seeking:
(S has just submitted an essay to the tutor)
S: Ok, what do you think about that?
T: Uh, well that uh you have uh there are too many parameters here which uh need definition ...
• Clarification-seeking:• T: So if there is if the only force on an object in
earth’s gravity then what is its motion called? • S: What was the motion called? • T: Yes, what’s the name for this motion?
• Yes-no questions, e.g.• Information-seeking tutor provides additional
information • Clarification clarification subdialogue
• Successful ITSs must be able to recognize the presence of a question in a student turn and its form and function
Question Corpus
• Human-human tutoring dialogs collected by Litman et al.’04 for development of ITSpoke, a speech-enabled ITS designed to teach physics• Why2-Atlas (Kurt VanLehn (U. Pitt), Art Graesser (U.
Memphis))
• Corpus includes 1030 student questions • ‘Question’ defined a la Bolinger ‘57 as “an utterance
that craves a response”• 25.2 Qs/hour• 13.3% of total student speaking time
• This study: a subset of 643 tokens
[pr01_sess00_prob58]
Question Detection
what symbol are you talking about
do i have to rewrite this again
am i ok with that
so it’d be one meter per second squared
Coding question type
• Form coding based on surface syntax• Declarative question (dQ): It’s a vector? A vector? • Yes-no question (ynQ): Is it a vector? • Wh-question (whQ): What is a vector? • Tag question (ynTAG): It’s a vector, isn’t it? • Alternative question (altQ): Is it a vector or a scalar? • Particle (part): Huh?
• Function coding derived from Stenström ‘84• Confirmation-seeking check question (chk)• Clarification-seeking question (clar)• Information-seeking question (info)• Other (oth)
Form/Function Distribution
chk clar info oth N (%)
dQ 257 81 2 4 344 (53.5)
ynQ 53 80 27 5 165 (25.7)
whQ - 47 21 - 68 (10.6)
ynTAG 41 5 - - 46 (7.2)
altQ 6 5 1 - 12 (1.9)
part - 8 - - 8 (1.2)
N 357 226 51 9 643
(%) (55.5) (35.1) (7.9) (1.4) (100)
Falling (L-L%) F0 contours
chk clar info oth N (%)
dQ 3 4 - - 7 (2.0)
ynQ - 4 5 2 11 (6.7)
whQ - 12 17 - 29 (42.6)
ynTAG 1 1 - - 2 (4.3)
altQ 2 5 1 - 8 (66.7)
part - - - - -
N 6 26 23 2 57
(%) (1.7) (11.5) (45.1) (22.2) (100)
F0 measures of non-falling questions
• Quantitative analysis of F0 height in the 573 non-falling tokens w/sufficient data for analysis
• Examined question nucleus (nucF0) and tail (btF0) only
• Speaker-normalized (z-score) F0 of:• 1. nuclear accent (nucF0)• 2. rightmost edge of question (btF0)• 3. difference between 1 & 2 (riserange)
Question Form and F0
• DeclQs and YNQs both thought to rise (H*H-H% vs. L*H-H%?): Are there F0 height differences between them?
• 2-way ANOVA on form x function:FORM: nucF0: F(5)=19.34, p=0
btF0: F(5)10.71, p=0
riserange: F(5)=3.6, p<.01• Planned comparisons (Tukey, alpha=.01) show no
difference between declarative Qs and yes-no Qs• Main effect of form caused by yes-no tags (low
F0) and particles (high F0)
Normalized means at nucF0 and btF0
chk clar info chk clar info
Question Function and F0
• Question dialog acts thought to correlate with F0: Does question FUNCTION affect F0?
• 2-way ANOVA on form x function:FUNCTION: nucF0: F(3)=16.6, p=0
btF0: F(3)=8.56, p<.001
riserange: F(3)=3.94, p<.01
• Main effect; planned comparisons show:• clarQ > chkQ (nucF0 & btF0)• infoQ > clarQ/chkQ (nucF0)• No interactions for any measure
Clarification types and F0
1 Channel: Problem hearing if the tutor actually said something or not (Huh?, Hm?)
2 Perception: Problem hearing what the tutor said (‘G’ as in God?, Did you say a word or a letter?, including reprise/echo questions (A what?)
3 Understanding: Problem with reference resolution (This up here?, What did I imply or what does the statement imply?), or with general understanding (Is that the same thing or is that different?, What do you mean?)
4 Intention: Problem determining what the tutor intended by his utterance (You want an exact number?, Uh are you asking me another characteristic of freefall?)
+ Non-interlocutor-related (NIR): Problem understanding the task (Am I supposed to speak this or type it?), or clarification of the examination question (Should I assume both vehicles are going at the same speed?)
Clark ‘96 levels of coordination: sources of communication problems
Effects of Clarification Type
• One-way ANOVA combining levels 1&2 into single acoustic/perceptual category:
nucF0: F(3)=5.41, p=.001btF0: F(3)=6.6, p<.001riserange: F(3)=2.59, p=.05
• Main effect for clarification type• Ranking for each measure:
higher F0 > > > > > > > > > > > > > > > lower F0acoust/percept > understanding > NIR > intention• Planned comparisons (Tukey, alpha=.01)
show only significant comparison was acoust/percep > intention
Can Prosody Distinguish Question Form? Question Function?
• Only a few question forms prosodically distinct in our study – lexico/syntactic information can help
• Question function more successfully differentiated prosodically – where there is less reliable lexico/syntactic information
• Can we use prosodic information with lexico-syntactic information to help identify question form and function automatically?
Detecting Student Questions
• Syntax• Wh-words, subject/auxiliary inversion
• Prosody• Phrase-final rising intonation (Pierrehumbert &
Hirschberg ‘90)• Duration and pausing (Shriberg et al. ‘98)
• Lexico-pragmatics• personal pronouns, utterance-initial pronouns
(Geluykens 1987; Beun 1990)
Corpus
• 141 ITSpoke dialogues• 5 hours of student speech• Student turns average 2.5 seconds• 1,030 questions• 25 questions per hour• 70% of turns consist entirely of the question• 89% of questions are turn-final
Question Form Distribution in ITSpoke
Form Example Distr.
yes/no Is that right? 24%
wh- What do you mean? 10%
yes/no tag It will stay the same, right? 7%
alternative Force or something? 3%
particle Huh? 2%
declarative The weight? 54%
Question-Bearing Turns
• Contain one or more questions
• N = 918
Features Extracted
• Prosodic• pitch• loudness• pausing• speaking rate• calculated over entire turn and last 200 ms
• Syntactic• unigram and bigram part-of-speech tags
Feature Extraction
• Lexical• unigram and bigram hand-labeled transcriptions
• Student and task dependent• pre-test score• gender• correctness• previous tutor dialogue act
Machine Learning Experiments
• Question-bearing vs. non-question-bearing• Down-sampled to 50/50 distribution• Experimented by feature type• Adaboosted C4.5 decision trees
• 5-fold cross validation
• Best results with all features• Accuracy = 79.7%• Precision = Recall = F-measure = 0.8
Accuracy by Feature Type
prosody: pausing and speaking rate 52.6%
student and task dependent 56.1%
prosody: loudness 61.8%
syntactic 65.3%
lexical 67.2%
prosody: last 200 ms 70.3%
prosody: pitch 72.6%
prosody: all 74.5%
Feature Type Discussion
• Which features most informative?• pitch slope of last 200 ms and entire turn• maximum and mean pitch of turn
• Which features most often used in learning?• pre-test score• slope of last 200 ms• maximum pitch of entire turn• cumulative pause duration
Other Observations
• Syntactic features were informative• personal pronoun + verb, wh-pronoun, interjection
• Lexical features were informative• yes, right, what, I, you
Conclusions
• Most questions in our tutoring corpus are declarative in form• More than syntax is needed to identify these as
questions• Prosodic features are very important
• Detecting question-bearing turns is possible• Detecting question function is needed
Question Forms in ITSpoke
Form Distr. Example
declarative 54% The weight?
yes/no 24% Is that right?
wh- 10% What do you mean?
yes/no tag 7% It will stay the same, right?
alternative 3% Force or something?
particle 2% Huh?
Recommended