Upload
della-hall
View
216
Download
1
Embed Size (px)
DESCRIPTION
Intelligent? How? Prediction tasks treated as binary classification problems Binary vector, where each dimension represents a feature Learning performed with logistic regression System evaluated using F 1, harmonic mean of precision and recall Single-user (adaptive) and cross-user (adaptable) settings
Citation preview
Intelligent Email: Reply and Attachment PredictionMark Dredze, Tova Brooks, Josh CarrollJoshua Magarick, John Blitzer, Fernando Pereira
Presented by Nareg Torosian
What’s the use?
Whittaker & Sidner’s “email overload” Task management Personal archiving Asynchronous communication
Assist overwhelmed email users Support enhanced email interface
Intelligent? How?
Prediction tasks treated as binary classification problems Binary vector , where each
dimension represents a feature Learning performed with logistic regression System evaluated using F1, harmonic mean
of precision and recall Single-user (adaptive) and cross-user
(adaptable) settings
Reply prediction
Indicate which messages require reply Allow user to manage these messages
Reply prediction features
Relational features Based on user profile
# of sent and received messages, address book, email address and domain
I appear in the CC list, I frequently reply to this user, etc.
200 in Dredze et al.’s experiment Document features
Presence of question marks and question words TF-IDF (term frequency – inverse document
frequency) scores Presence of attachments 14,800 in Dredze et al.’s experiment
The grand experiment
Evaluated on 4 user mailboxes Users manually tagged messages as
either needs reply or does not need reply “It is not surprising that overwhelmed users
acknowledge that a message did require their reply even though they failed to do so; classifiers trained on actual user reply behavior are thus very poor.”
2,391 total emails, excluding spam 80/20 train/test split
The single-user results
The cross-user results
Only relational features were effective, so others omitted
Attachment prediction
“See attachment…hey, wait a minute…” Possible UI considerations
Document sidebar Alert user before sending
Indicate which messages need attachments
Attachment prediction features
Relational features Based on user profile
# of sent and received messages, # of attachments, email address and domain
Conjunctions between volume of messages/attachments and TO/CC fields
72 in Dredze et al.’s experiment Document features
Presence and placement of “attach” Presence of attachments 39,308 in Dredze et al.’s experiment
The grander experiment
Evaluated on publicly available Enron email corpus 150 users and 250,000 emails Lots of cleanup needed
Users manually tagged messages as needs attachment Only popular document formats Forwarded messages excluded
Subset of 15,000 messages from 144 users 1,020 with attachments
10-fold cross validation
The results
GUEPs and CDs GUEPs
Mental model Improvement Consistency
CDs Premature commitment Hidden dependencies Abstraction Consistency Provisionality