17
Email Conference 2005 Overview • 26 papers (69 submitted), approx. 150 people attended (in 2004, 29 papers out of 80 submissions, 180 people attended)...AAAI effect • More Email, less Spam papers. Number of Microsoft papers: 7 • Same size (2 days), Same place. In 2006: same place and size, no one from MS as chair.

Email Conference 2005 Overview 26 papers (69 submitted), approx. 150 people attended (in 2004, 29 papers out of 80 submissions, 180 people attended)...AAAI

Embed Size (px)

DESCRIPTION

Papers 1.PEEP- An Information Extraction base approach for Privacy Protection in Boufaden, Elazmeh, Ma, Stan Matwin, El-Kadri, JapkowiczPEEP- An Information Extraction base approach for Privacy Protection in 2.The Social Network and Relationship Finder: Social Sorting for Triage Carman Neustaedter, A.J. Bernheim Brush, Marc A. Smith, Danyel FisherThe Social Network and Relationship Finder: Social Sorting for Triage 3. Task Management: An Iterative Relational Learning Approach Rinat Khoussainov, Nicholas Kushmerick Task Management: An Iterative Relational Learning Approach

Citation preview

Page 1: Email Conference 2005 Overview 26 papers (69 submitted), approx. 150 people attended (in 2004, 29 papers out of 80 submissions, 180 people attended)...AAAI

Email Conference 2005 Overview

• 26 papers (69 submitted), approx. 150 people attended (in 2004, 29 papers out of 80 submissions, 180 people attended)...AAAI effect

• More Email, less Spam papers. Number of Microsoft papers: 7

• Same size (2 days), Same place. In 2006: same place and size, no one from MS as chair.

Page 2: Email Conference 2005 Overview 26 papers (69 submitted), approx. 150 people attended (in 2004, 29 papers out of 80 submissions, 180 people attended)...AAAI

Spam Papers

1. Spam Corpus Creation for TREC , Gordon Cormack, Thomas Lynam 1. Competition starting in 20052. Preliminary results using Enron corpus

2. Comparative Graph Theoretical Characterization of Networks of Spam Gomes, R. Almeida, Bettencourt, V. Almeida, J. Almeida

3. Spam Deobfuscation using a Hidden Markov Model, Honglak Lee, Andrew Ng

Page 3: Email Conference 2005 Overview 26 papers (69 submitted), approx. 150 people attended (in 2004, 29 papers out of 80 submissions, 180 people attended)...AAAI

Email Papers1. PEEP- An Information Extraction base approach for Privacy Protection in

Email Boufaden, Elazmeh, Ma, Stan Matwin, El-Kadri, Japkowicz

2. The Social Network and Relationship Finder: Social Sorting for Email Triage Carman Neustaedter, A.J. Bernheim Brush, Marc A. Smith, Danyel Fisher

3. Email Task Management: An Iterative Relational Learning Approach Rinat Khoussainov, Nicholas Kushmerick

Page 4: Email Conference 2005 Overview 26 papers (69 submitted), approx. 150 people attended (in 2004, 29 papers out of 80 submissions, 180 people attended)...AAAI

Comparative Graph Theoretical Characterization of Networks of Spam Gomes, R. Almeida, Bettencourt, V. Almeida, J. Almeida

• 2 graphs: User Graph and Domain Graph

• Large dataset (615K msgs)• Different Metrics:

– Average Clustering Coefficient– Prob. Finding a node during a

random walk– etc– Communication Reciprocity CR

• Practical results?

Page 5: Email Conference 2005 Overview 26 papers (69 submitted), approx. 150 people attended (in 2004, 29 papers out of 80 submissions, 180 people attended)...AAAI

Spam Deobfuscation using a Hidden Markov Model, Honglak Lee, Andrew Ng

• Spammers obfuscate emails by deliberately using misspellings, typos, etc (Table Below)

• There are anti-spam systems using RegExp to detect obfuscated words. Not robust, low recall.

• Idea: build an HMM robust to some types of obfuscation: misspellings, adding/removing spaces, substitution/insertion of non-alphabetical chars

Page 6: Email Conference 2005 Overview 26 papers (69 submitted), approx. 150 people attended (in 2004, 29 papers out of 80 submissions, 180 people attended)...AAAI

Spam Deobfuscation using a Hidden Markov Model, Honglak Lee, Andrew Ng

1. First Model: Lexicon Tree

• 45K words in dictionary• 111K states• Emission set has 70 chars: 26

letters + space + other ASCII chars (*,/,-,+, etc)

• Sf links to So

• Self-transitions: substitutions and insertions

• Epsilon-transitions: deletions• Parameters to control self and

epsilon transitions

Page 7: Email Conference 2005 Overview 26 papers (69 submitted), approx. 150 people attended (in 2004, 29 papers out of 80 submissions, 180 people attended)...AAAI

Spam Deobfuscation using a Hidden Markov Model, Honglak Lee, Andrew Ng

1. 2nd Model: Out-of-Dictionary HMM

• Both models use Beam search to decode.

Page 8: Email Conference 2005 Overview 26 papers (69 submitted), approx. 150 people attended (in 2004, 29 papers out of 80 submissions, 180 people attended)...AAAI

Spam Deobfuscation using a Hidden Markov Model, Honglak Lee, Andrew Ng

• Results

Page 9: Email Conference 2005 Overview 26 papers (69 submitted), approx. 150 people attended (in 2004, 29 papers out of 80 submissions, 180 people attended)...AAAI

PEEP- An Information Extraction base approach for Privacy Protection in Email

Boufaden, Elazmeh, Ma, Stan Matwin, El-Kadri, Japkowicz

• idea: monitor outgoing emails for potential privacy breaches in a university

• 4 parts of architecture:– Preprocessing (segmentation,

abreviations, verb-object, from, to, etc)

– IE: extracts private info (Grades, names, addresses, IDs, etc)

– Ontology and roles (student, attributes of student, professor, dean, secretary, course, etc)

– Violation Detection (set of privacy rules)

Page 10: Email Conference 2005 Overview 26 papers (69 submitted), approx. 150 people attended (in 2004, 29 papers out of 80 submissions, 180 people attended)...AAAI

PEEP- An Information Extraction base approach for Privacy Protection in Email

Boufaden, Elazmeh, Ma, Stan Matwin, El-Kadri, Japkowicz

• Domain Knowledge

• Info Access Ontology

Page 11: Email Conference 2005 Overview 26 papers (69 submitted), approx. 150 people attended (in 2004, 29 papers out of 80 submissions, 180 people attended)...AAAI

PEEP- An Information Extraction base approach for Privacy Protection in Email

Boufaden, Elazmeh, Ma, Stan Matwin, El-Kadri, Japkowicz

• Info Extraction System1. Shallow Parsing

– CASS partial parser (Abney) and Brill’s POS tagger.

2. Semantic Tagging– List of words related to 3 classes: Verb-Score (score, receive, rank, etc),

Assignment (mark, test, exam, etc) and ID (identification number, student ID, etc). In a small test, this tagger had F1 of 95%.

3. Individual Facts– “It uses Markov models to learn relevant sequences of semantic tags alogn with

their semantic role. This stage allows the detection of the target relation “the Assignment mark X of student Y””. Extracts the facts X and Y from the semantic tag sequence learned.

– Output: set of relations and facts in prolog format

Page 12: Email Conference 2005 Overview 26 papers (69 submitted), approx. 150 people attended (in 2004, 29 papers out of 80 submissions, 180 people attended)...AAAI

PEEP- An Information Extraction base approach for Privacy Protection in Email

Boufaden, Elazmeh, Ma, Stan Matwin, El-Kadri, Japkowicz

• Overall Results:

Page 13: Email Conference 2005 Overview 26 papers (69 submitted), approx. 150 people attended (in 2004, 29 papers out of 80 submissions, 180 people attended)...AAAI

The Social Network and Relationship Finder: Social Sorting for Email Triage C. Neustaedter, A.J. Bernheim Brush, M. A. Smith, D. Fisher

• SNARF (social network and relationship finder)• Social sorting: using social metrics to bring important

emails to the top. Metrics: sent and received

Page 14: Email Conference 2005 Overview 26 papers (69 submitted), approx. 150 people attended (in 2004, 29 papers out of 80 submissions, 180 people attended)...AAAI

The Social Network and Relationship Finder: Social Sorting for Email Triage C. Neustaedter, A.J. Bernheim Brush, M. A. Smith, D. Fisher

• Person-centric visualization

• Social importance can be decided by the many metrics or manually

• Importance can be time dependent

Page 15: Email Conference 2005 Overview 26 papers (69 submitted), approx. 150 people attended (in 2004, 29 papers out of 80 submissions, 180 people attended)...AAAI

Email Task Management: An Iterative Relational Learning Approach Rinat Khoussainov, Nicholas Kushmerick

• Combine relations identification with Combine relations identification with speech actsspeech acts learning, but for free-text (human-generated) email:

– RelationshipsRelationships between messages between messages in the same task provide additional contextcontext to each message that can help to identify speech actshelp to identify speech acts

– Speech actsSpeech acts can help to find relationshipshelp to find relationships between messages and, subsequently, group them into tasks

Slide from CEAS-05, Khoussainov+Kushmerick

Page 16: Email Conference 2005 Overview 26 papers (69 submitted), approx. 150 people attended (in 2004, 29 papers out of 80 submissions, 180 people attended)...AAAI

Email Task Management: An Iterative Relational Learning Approach Rinat Khoussainov, Nicholas Kushmerick

• Initial relationsInitial relations::– text similarity + structured info (subject, send time difference)

• Initial speech actsInitial speech acts::– bag-of-words, SVM

• Using speechUsing speech acts acts to clarifyto clarify relations relations:– identify potential parents for each message (as above)– use a classifier (SVM) trained on similarity and speech acts to prune

the links

• Using rUsing relations elations to classifyto classify speech acts speech acts::– use speech acts of related (surrounding) messages as extrinsic

features in an iterative relational classification algorithmiterative relational classification algorithm

Slide from CEAS-05, Khoussainov+Kushmerick

Page 17: Email Conference 2005 Overview 26 papers (69 submitted), approx. 150 people attended (in 2004, 29 papers out of 80 submissions, 180 people attended)...AAAI

Email Task Management: An Iterative Relational Learning Approach Rinat Khoussainov, Nicholas Kushmerick

• Speech actsSpeech acts– Set of binary classification problems for each act– Kappa statistics measure (to account for imbalance in data)– Initial: at “0”– During 1st iteration: 1..9– During 2nd iteration remained the same

• RelationsRelations– Initial: P=R=F1=0.95– After 1st iteration: P=1.0; R=0.95; F1=0.98– After 2nd iteration: remained the same

Slide from CEAS-05, Khoussainov+Kushmerick