14
HTL-ACTS Workshop, June 2006, New York City HTL-ACTS Workshop, June 2006, New York City Improving Email Speech Acts Improving Email Speech Acts Analysis via N-gram Selection Analysis via N-gram Selection Vitor R. Carvalho & William W. Cohen Carnegie Mellon University

HTL-ACTS Workshop, June 2006, New York City Improving Email Speech Acts Analysis via N-gram Selection Vitor R. Carvalho & William W. Cohen Carnegie Mellon

  • View
    214

  • Download
    0

Embed Size (px)

Citation preview

HTL-ACTS Workshop, June 2006, New York CityHTL-ACTS Workshop, June 2006, New York City

Improving Email Speech Acts Improving Email Speech Acts Analysis via N-gram SelectionAnalysis via N-gram Selection

Vitor R. Carvalho & William W. CohenCarnegie Mellon University

OutlineOutline

1.1. Email Speech Acts: Can we do it? Email Speech Acts: Can we do it? What for?What for?

IntroductionIntroduction DataData ApplicationsApplications

2.2. Language CuesLanguage Cues PreprocessingPreprocessing N-gramsN-grams

3.3. ResultsResults

MotivationMotivation Email classification forEmail classification for

topic/folder identification topic/folder identification spam/non-spamspam/non-spam

Speech-act classification in conversational Speech-act classification in conversational speech (aka dialog act classification)speech (aka dialog act classification) email is new domain - multiple acts/msgemail is new domain - multiple acts/msg

Winograd’s Coordinator (1987): users Winograd’s Coordinator (1987): users manuallymanually annotated email with intent. annotated email with intent.

Extra work for (lazy) usersExtra work for (lazy) users Murakoshi Murakoshi et alet al (1999): (1999): hand-codedhand-coded rules for rules for

identifying speech-act like labels in Japanese identifying speech-act like labels in Japanese emailsemails

““Email Acts” TaxonomyEmail Acts” Taxonomy

An Act is described as a An Act is described as a verb-nounverb-noun pair (e.g., propose meeting, request pair (e.g., propose meeting, request information) - Not all pairs make information) - Not all pairs make sensesense

Single email message may contain Single email message may contain multiple actsmultiple acts

Try to describe commonly observed Try to describe commonly observed behaviors, rather than all possible behaviors, rather than all possible speech acts in Englishspeech acts in English

Also include non-linguistic usage of Also include non-linguistic usage of email (e.g. delivery of files)email (e.g. delivery of files)

From: Benjamin Han

To: Vitor Carvalho

Subject: LTI Student Research Symposium

Hey Vitor

When exactly is the LTI SRS submission deadline?

Also, don’t forget to ask Eric about the SRS webpage.

Thanks.

BenRequest - Information

Reminder - Action/Task

Classifying Email into Acts [Cohen, Carvalho & Mitchell, EMNLP-04][Cohen, Carvalho & Mitchell, EMNLP-04]

Verb

Commisive Directive

Deliver Commit Request Propose

Amend

Noun

Activity

OngoingEvent

MeetingOther

Delivery

Opinion Data

Verb

Commisive Directive

Deliver Commit Request Propose

Amend

Noun

Activity

OngoingEvent

MeetingOther

Delivery

Opinion Data

An An ActAct is a is a verb-nounverb-noun pair (e.g., pair (e.g., propose meeting) propose meeting)

One single email message may One single email message may contain multiple acts. Not all contain multiple acts. Not all pairs make sense. pairs make sense.

Try to describe commonly Try to describe commonly observed behaviors, rather than observed behaviors, rather than all possible speech acts.all possible speech acts.

Also include non-linguistic Also include non-linguistic usage of email (delivery of files)usage of email (delivery of files)

Most of the acts can be learned Most of the acts can be learned (EMNLP-04)(EMNLP-04)Noun

s

Verbs

Email Acts - ApplicationsEmail Acts - Applications

Improved email clients.Improved email clients. Negotiating/managing shared tasks is a central use of Negotiating/managing shared tasks is a central use of

emailemail Tracking commitments, delegations, pending answersTracking commitments, delegations, pending answers Integrating to-do/task lists to email, etc.Integrating to-do/task lists to email, etc. Email overloadEmail overload

Iterative Learning of Email Tasks and Iterative Learning of Email Tasks and Speech ActsSpeech Acts

Predicting Social Roles and Group Predicting Social Roles and Group Leadership.Leadership.

Kushmerick et al, AAAI-06

Kushmerick & Khousainov, IJCAI-05, CEAS-05

Leusky, SIGIR-04

Carvalho et al. in progress

Data: CSPACE CorpusData: CSPACE Corpus Few large, free, natural email corpora are Few large, free, natural email corpora are

availableavailable CSPACE corpus (Kraut & Fussell)CSPACE corpus (Kraut & Fussell)

o Emails associated with a semester-long Emails associated with a semester-long project for Carnegie Mellon MBA students in project for Carnegie Mellon MBA students in 19971997

o 15,000 messages from 277 students, divided 15,000 messages from 277 students, divided in 50 teams (4 to 6 students/team)in 50 teams (4 to 6 students/team)

o Rich in task negotiation. Rich in task negotiation. o 1500+ messages (5 teams) had their “Speech 1500+ messages (5 teams) had their “Speech

Acts” labeled.Acts” labeled.o One of the teams was double labeled, and the One of the teams was double labeled, and the

inter-annotator agreement ranges from 72 to inter-annotator agreement ranges from 72 to 83% (Kappa) for the most frequent acts.83% (Kappa) for the most frequent acts.

Inter-Annotator AgreementInter-Annotator Agreement

Kappa StatisticKappa Statistic A = probability of A = probability of

agreement in a agreement in a categorycategory

R = prob. of R = prob. of agreement for 2 agreement for 2 annotators labeling annotators labeling at randomat random

Kappa range: -1…Kappa range: -1…+1+1

Inter-Annotator Agreement

Email Act Kappa

Deliver 0.75Commit 0.72Request 0.81Amend 0.83

Meeting 0.82Propose 0.72

PreProcessingPreProcessing Signature and Quoted removalSignature and Quoted removal

Request Act: IG n-gramsRequest Act: IG n-grams

Error Rate AnalysisError Rate Analysis

0.05

0.1

0.15

0.2

0.25

0.3

Request Commit Deliver Propose Meet dData

Err

or

Ra

te

1g (1354 msgs: EMNLP04)

1g (1716 msgs)

1g+PreProcess

1g+2g+3g+PreProcess

1g+2g+3g+4g+5g+Preprocess

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.2 0.4 0.6 0.8 1

Recall

Pre

cis

ion

1g (1716 msgs) 1g+2g+3g+PreProcess

Idea: Predicting Acts from Surrounding Acts

Delivery

Request

Commit

Proposal

Request

Commit

Delivery

Commit

Delivery

<<In-ReplyTo>> • Act has little or no correlation with other acts of same message

• Strong correlation with previous and next message’s acts

Example of Email Thread Sequence