Learning to Classify Learning to Classify Email into “Speech Email into “Speech
Acts”Acts”
William W. Cohen, Vitor R. Carvalho and Tom William W. Cohen, Vitor R. Carvalho and Tom M. MitchellM. Mitchell
Presented by Vitor R. CarvalhoPresented by Vitor R. CarvalhoIR Discussion Series - August 12IR Discussion Series - August 12thth 2004 - CMU 2004 - CMU
Imagine an hypothetical email assistant that can Imagine an hypothetical email assistant that can detect “speech acts”…detect “speech acts”…
Do you have any data with xml-tagged names? I need it ASAP!
urgent
Request - may take action - request pending
Sure. I’ll put it together by Sunday.
Here’s the tar ball on afs : ~vitor/names.tar.gz
“should I add this Commitment to your to-do list?”
Urgent Request - May take action
A Commitment is detected.
“Should I send Vitor a reminder on Sunday?”
A Delivery of data is detected. - pending cancelled
1
2
3
Delivery is sent
- to-do list updated
OutlineOutline1)1) Setting the baseSetting the base
““Email speech act” TaxonomyEmail speech act” Taxonomy DataData Inter-annotator agreementInter-annotator agreement
2)2) ResultsResults Learnability of “email acts”Learnability of “email acts” Different learning algorithms, “acts”, etcDifferent learning algorithms, “acts”, etc Different representationsDifferent representations
3)3) ImprovementsImprovements Collective/Relational/Iterative Collective/Relational/Iterative
classification classification
Related WorkRelated Work Email classification forEmail classification for
topic/folder identification topic/folder identification spam/non-spamspam/non-spam
Speech-act classification in conversational Speech-act classification in conversational speechspeech email is new domain - multiple acts/msgemail is new domain - multiple acts/msg
Winograd’s Coordinator (1987): users Winograd’s Coordinator (1987): users manuallymanually annotated email with intent.annotated email with intent.
Extra work for (lazy) usersExtra work for (lazy) users Murakoshi Murakoshi et alet al (1999): (1999): hand-codedhand-coded rules for rules for
identifying speech-act like labels in Japanese identifying speech-act like labels in Japanese emailsemails
““Email Acts” TaxonomyEmail Acts” Taxonomy
Single email message may Single email message may contain multiple actscontain multiple acts
An Act is described as a An Act is described as a verb-verb-nounnoun pair (e.g., propose pair (e.g., propose meeting, request information) - meeting, request information) - Not all pairs make senseNot all pairs make sense
Try to describe commonly Try to describe commonly observed behaviors, rather than observed behaviors, rather than all possible speech acts in all possible speech acts in EnglishEnglish
Also include non-linguistic usage Also include non-linguistic usage of email (e.g. delivery of files)of email (e.g. delivery of files)
From: Benjamin Han
To: Vitor Carvalho
Subject: LTI Student Research Symposium
Hey Vitor
When exactly is the LTI SRS submission deadline?
Also, don’t forget to ask Eric about the SRS webpage.
See you
BenRequest - Information
Reminder - action/task
A Taxonomy of “Email Acts”A Taxonomy of “Email Acts”
Verb
Remind
ProposeDeliverCommit
Request
Amend
Refuse
Greet
OtherNegotiate
Initiate Conclude
A Taxonomy of “Email A Taxonomy of “Email Acts”Acts”
Noun
ActivityInformation
Meeting Logistics Data
Opinion Ongoing Activity
Data Single Event
MeetingOther Short Term Task
Other Data Committee
<Verb><Noun>
CorporaCorpora Few large, natural email corpora are availableFew large, natural email corpora are available CSPACE corpus (Kraut & Fussell)CSPACE corpus (Kraut & Fussell)
o Email associated with a semester-long project for Email associated with a semester-long project for GSIA MBA students in 1997GSIA MBA students in 1997
o 15,000 messages from 277 students in 50 teams (4 to 15,000 messages from 277 students in 50 teams (4 to 6/team)6/team)
o Rich in task negotiation Rich in task negotiation o N02F2, N01F3, N03F2N02F2, N01F3, N03F2: all messages from students : all messages from students
in three teams (341, 351, 443 messages). in three teams (341, 351, 443 messages). SRI’s “Project World” CALO corpus:SRI’s “Project World” CALO corpus:
o 6 people in artificial task scenario over four days6 people in artificial task scenario over four dayso 222 messages (publically available)222 messages (publically available)
Double-labeled
Inter-Annotator AgreementInter-Annotator Agreement
Kappa StatisticKappa Statistic A = probability of A = probability of
agreement in a agreement in a categorycategory
R = prob. of R = prob. of agreement for 2 agreement for 2 annotators labeling annotators labeling at randomat random
Kappa range: -1…Kappa range: -1…+1+1
Inter-Annotator Agreement
Email Act Kappa
Deliver 0.75Commit 0.72Request 0.81Amend 0.83Propose 0.72
Inter-Annotator Inter-Annotator AgreementAgreement
for messages with only one single “verb”for messages with only one single “verb”
Learnability of Email ActsLearnability of Email ActsFeatures: un-weighted word frequency counts (BOW)Features: un-weighted word frequency counts (BOW)
5-fold cross-validation5-fold cross-validation
(Directive = Req or Prop or Amd)(Directive = Req or Prop or Amd)
Class: Directive
0.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 1
Recall
Pre
cis
ion
N = 100
N = 200
N = 400
N = 1357
SVM Learner
N = Number of Email Messages
Class: Directive(Total: 1357 msgs)
0.2
0.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 1
Recall
Pre
cis
ion VotedPerceptron
AdaBoost
SVM
DecisionTree
(Directive Act = Req or Prop (Directive Act = Req or Prop or Amd)or Amd)
Using Different LearnersUsing Different Learners
Learning Learning RequestsRequests only only
Class Req - Total: 1257 msgs
0
0.2
0.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 1
Recall
Pre
cisi
on
Voted Perceptron
AdaBoost
SVM
DecisionTree
(Commissive Act = Delivery or (Commissive Act = Delivery or Commitment)Commitment)
Learning Learning CommissivesCommissives
Class DlvCmt - Total: 1257 msgs
0.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 1
Recall
Pre
cis
ion Voted Perceptron
AdaBoost
SVM
Decision Tree
Learning Learning DeliveriesDeliveries only only
Class Dlv - Total: 1257 msgs
0.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 1
Recall
Pre
cis
ion
Voted Perceptron
AdaBoost
SVM
Decision Tree
Learning to recognize Learning to recognize CommitmentsCommitments
Class Cmt - Total: 1257 msgs
0
0.2
0.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 1
Recall
Pre
cis
ion
Voted Perceptron
AdaBoost
SVM
DecisionTree
Request+Amend+Propose Commit Deliver
Most Informative Features (are common words)
Learning: document Learning: document representationrepresentation
Variants exploredVariants explored
TFIDF -> TF weighting (don’t downweight TFIDF -> TF weighting (don’t downweight common words)common words)
bigramsbigrams For commitment: “i will”, “i agree”, in top 5 featuresFor commitment: “i will”, “i agree”, in top 5 features For directive: “do you”, “could you”, “can you”, For directive: “do you”, “could you”, “can you”,
“please advise” in top 25“please advise” in top 25 count of count of time expressionstime expressions words near a time expressionwords near a time expression words near proper noun or pronounwords near proper noun or pronoun POS countsPOS counts
Learning: document representation
... but most of improvement from discarding IDF weighting
F1 m
ea
su
re o
n 1
0-f
old
CV
Baseline classifier: linear-kernel SVM with Baseline classifier: linear-kernel SVM with TFIDF weightingTFIDF weighting
Collective Classification Collective Classification (relational)(relational)
Collective ClassificationCollective Classification BOW classifier output as features (7 binary features = req, dlv, amd, BOW classifier output as features (7 binary features = req, dlv, amd,
prop, etc)prop, etc) MaxEnt Learner, Training set = N03f2, Test set = N01f3MaxEnt Learner, Training set = N03f2, Test set = N01f3 Features: current msg + parent msg + child message (1Features: current msg + parent msg + child message (1stst child only) child only) ““Related” msgs = messages with a parent and/or child messageRelated” msgs = messages with a parent and/or child message
N01f3 dataset
Req Dlv Cmt Prop Amd ReqAmdProp DlvCmt
Entire dataset(351)
F1 54.61 74.47 34.61 28.98 16.00 68.30 80.97
Kappa 28.21 34.88 23.94 21.76 13.02 35.00 22.84
“Related”msgs only(170)
F1 56.92 71.71 38.09 39.21 22.22 75.00 80.47
Kappa 33.08 32.74 24.02 28.72 17.93 43.70 27.14
… useful for “related” messages
Collective/Iterative Collective/Iterative ClassificationClassification
Start with baseline Start with baseline (BOW)(BOW)
How to make updates?How to make updates? Chronological orderChronological order Using “family-Using “family-
heuristics” heuristics” (child first, (child first, parent first, etc)parent first, etc)
Using posterior Using posterior probabilityprobability(Maximum Entropy learner)(Maximum Entropy learner)
(Threshold, ranking, etc)(Threshold, ranking, etc)
TIME
0.85
0.53
0.65
0.95
0.85
0.93
Iterative Classification: Iterative Classification: CommitmentCommitment
Comparing update heuristics, Cmt act, "related" dataset
0.1
0.15
0.2
0.25
0.3
0.35
0.4
100 96 92 88 84 80 76 72 68 64 60 56 52
Prob Threshold to update(%)
Kap
pa
chronological order
child only first
child first
parent only first
parent first
Update msgs with child first boosts the performance.
(50%) = all messages updated(100%) = no updates at all
Iterative Classification: Iterative Classification: RequestRequest
Iterative Classification: Comparing Update Reuristics Req act
0.25
0.27
0.29
0.31
0.33
0.35
0.37
0.39
0.41
0.43
100 96 92 88 84 80 76 72 68 64 60 56 52
Prob Threshold to update(%)K
ap
pa
chronological order
child only first
child first
parent only first
parent first
Update msgs with parent only first is better.
(50%) = all messages updated(100%) = no updates at all
Iterative Classification: Iterative Classification: Dlv+CmtDlv+Cmt
Iterative Classification: Comparing update heuristics DlvCmt act
0.2
0.25
0.3
0.35
0.4
0.45
100 96 92 88 84 80 76 72 68 64 60 56 52
Prob Threshold to update(%)
Ka
pp
a
chronological order
child only first
child first
parent only first
parent first
To update msgs with parent first is better.
Conclusions/SummaryConclusions/Summary Negotiating/managingNegotiating/managing shared tasks is a central use of shared tasks is a central use of
emailemail
Proposed a taxonomy for “email acts” - could be Proposed a taxonomy for “email acts” - could be useful for tracking commitments, delegations, useful for tracking commitments, delegations, pending answers, integrating to-do lists and calendars pending answers, integrating to-do lists and calendars to email, etcto email, etc
Inter-annotator agreement → 70-80’s (kappa)Inter-annotator agreement → 70-80’s (kappa)
Learned classifiers Learned classifiers cancan do this to some reasonable do this to some reasonable degree of accuracy (90% precision at 50-60% recall degree of accuracy (90% precision at 50-60% recall for top level of taxonomy)for top level of taxonomy) Fancy tricks with IE, bigrams, POS offer modest Fancy tricks with IE, bigrams, POS offer modest
improvement over baseline TF-weighted systemsimprovement over baseline TF-weighted systems
Conclusions/Future WorkConclusions/Future Work
Teamwork Teamwork (Collective/Iterative (Collective/Iterative
classification)classification) seems to helps a seems to helps a lot!lot!
Future work: Future work: Integrate all features + best Integrate all features + best
learners + tricks…tune the learners + tricks…tune the systemsystem
Social network analysisSocial network analysis