View
214
Download
0
Tags:
Embed Size (px)
Citation preview
HTL-ACTS Workshop, June 2006, New York CityHTL-ACTS Workshop, June 2006, New York City
Improving Email Speech Acts Improving Email Speech Acts Analysis via N-gram SelectionAnalysis via N-gram Selection
Vitor R. Carvalho & William W. CohenCarnegie Mellon University
OutlineOutline
1.1. Email Speech Acts: Can we do it? Email Speech Acts: Can we do it? What for?What for?
IntroductionIntroduction DataData ApplicationsApplications
2.2. Language CuesLanguage Cues PreprocessingPreprocessing N-gramsN-grams
3.3. ResultsResults
MotivationMotivation Email classification forEmail classification for
topic/folder identification topic/folder identification spam/non-spamspam/non-spam
Speech-act classification in conversational Speech-act classification in conversational speech (aka dialog act classification)speech (aka dialog act classification) email is new domain - multiple acts/msgemail is new domain - multiple acts/msg
Winograd’s Coordinator (1987): users Winograd’s Coordinator (1987): users manuallymanually annotated email with intent. annotated email with intent.
Extra work for (lazy) usersExtra work for (lazy) users Murakoshi Murakoshi et alet al (1999): (1999): hand-codedhand-coded rules for rules for
identifying speech-act like labels in Japanese identifying speech-act like labels in Japanese emailsemails
““Email Acts” TaxonomyEmail Acts” Taxonomy
An Act is described as a An Act is described as a verb-nounverb-noun pair (e.g., propose meeting, request pair (e.g., propose meeting, request information) - Not all pairs make information) - Not all pairs make sensesense
Single email message may contain Single email message may contain multiple actsmultiple acts
Try to describe commonly observed Try to describe commonly observed behaviors, rather than all possible behaviors, rather than all possible speech acts in Englishspeech acts in English
Also include non-linguistic usage of Also include non-linguistic usage of email (e.g. delivery of files)email (e.g. delivery of files)
From: Benjamin Han
To: Vitor Carvalho
Subject: LTI Student Research Symposium
Hey Vitor
When exactly is the LTI SRS submission deadline?
Also, don’t forget to ask Eric about the SRS webpage.
Thanks.
BenRequest - Information
Reminder - Action/Task
Classifying Email into Acts [Cohen, Carvalho & Mitchell, EMNLP-04][Cohen, Carvalho & Mitchell, EMNLP-04]
Verb
Commisive Directive
Deliver Commit Request Propose
Amend
Noun
Activity
OngoingEvent
MeetingOther
Delivery
Opinion Data
Verb
Commisive Directive
Deliver Commit Request Propose
Amend
Noun
Activity
OngoingEvent
MeetingOther
Delivery
Opinion Data
An An ActAct is a is a verb-nounverb-noun pair (e.g., pair (e.g., propose meeting) propose meeting)
One single email message may One single email message may contain multiple acts. Not all contain multiple acts. Not all pairs make sense. pairs make sense.
Try to describe commonly Try to describe commonly observed behaviors, rather than observed behaviors, rather than all possible speech acts.all possible speech acts.
Also include non-linguistic Also include non-linguistic usage of email (delivery of files)usage of email (delivery of files)
Most of the acts can be learned Most of the acts can be learned (EMNLP-04)(EMNLP-04)Noun
s
Verbs
Email Acts - ApplicationsEmail Acts - Applications
Improved email clients.Improved email clients. Negotiating/managing shared tasks is a central use of Negotiating/managing shared tasks is a central use of
emailemail Tracking commitments, delegations, pending answersTracking commitments, delegations, pending answers Integrating to-do/task lists to email, etc.Integrating to-do/task lists to email, etc. Email overloadEmail overload
Iterative Learning of Email Tasks and Iterative Learning of Email Tasks and Speech ActsSpeech Acts
Predicting Social Roles and Group Predicting Social Roles and Group Leadership.Leadership.
Kushmerick et al, AAAI-06
Kushmerick & Khousainov, IJCAI-05, CEAS-05
Leusky, SIGIR-04
Carvalho et al. in progress
Data: CSPACE CorpusData: CSPACE Corpus Few large, free, natural email corpora are Few large, free, natural email corpora are
availableavailable CSPACE corpus (Kraut & Fussell)CSPACE corpus (Kraut & Fussell)
o Emails associated with a semester-long Emails associated with a semester-long project for Carnegie Mellon MBA students in project for Carnegie Mellon MBA students in 19971997
o 15,000 messages from 277 students, divided 15,000 messages from 277 students, divided in 50 teams (4 to 6 students/team)in 50 teams (4 to 6 students/team)
o Rich in task negotiation. Rich in task negotiation. o 1500+ messages (5 teams) had their “Speech 1500+ messages (5 teams) had their “Speech
Acts” labeled.Acts” labeled.o One of the teams was double labeled, and the One of the teams was double labeled, and the
inter-annotator agreement ranges from 72 to inter-annotator agreement ranges from 72 to 83% (Kappa) for the most frequent acts.83% (Kappa) for the most frequent acts.
Inter-Annotator AgreementInter-Annotator Agreement
Kappa StatisticKappa Statistic A = probability of A = probability of
agreement in a agreement in a categorycategory
R = prob. of R = prob. of agreement for 2 agreement for 2 annotators labeling annotators labeling at randomat random
Kappa range: -1…Kappa range: -1…+1+1
Inter-Annotator Agreement
Email Act Kappa
Deliver 0.75Commit 0.72Request 0.81Amend 0.83
Meeting 0.82Propose 0.72
Error Rate AnalysisError Rate Analysis
0.05
0.1
0.15
0.2
0.25
0.3
Request Commit Deliver Propose Meet dData
Err
or
Ra
te
1g (1354 msgs: EMNLP04)
1g (1716 msgs)
1g+PreProcess
1g+2g+3g+PreProcess
1g+2g+3g+4g+5g+Preprocess
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 0.2 0.4 0.6 0.8 1
Recall
Pre
cis
ion
1g (1716 msgs) 1g+2g+3g+PreProcess