53
On Detecting Deception Sadia Afroz Privacy, Security and Automation Lab (PSAL) Drexel University Sunday, October 21, 12

On Detecting Deception - icsi.berkeley.edusadia/talks/sadia_ghc_talk.pdf · On Detecting Deception Sadia Afroz ... •Previously used for detecting lying. •Includes features like

  • Upload
    others

  • View
    7

  • Download
    0

Embed Size (px)

Citation preview

Page 1: On Detecting Deception - icsi.berkeley.edusadia/talks/sadia_ghc_talk.pdf · On Detecting Deception Sadia Afroz ... •Previously used for detecting lying. •Includes features like

On Detecting DeceptionSadia Afroz

Privacy, Security and Automation Lab (PSAL)Drexel University

Sunday, October 21, 12

Page 2: On Detecting Deception - icsi.berkeley.edusadia/talks/sadia_ghc_talk.pdf · On Detecting Deception Sadia Afroz ... •Previously used for detecting lying. •Includes features like

What is Deception?

• Deception: An adversarial behavior that disrupts regular behavior of a system

Sunday, October 21, 12

Page 3: On Detecting Deception - icsi.berkeley.edusadia/talks/sadia_ghc_talk.pdf · On Detecting Deception Sadia Afroz ... •Previously used for detecting lying. •Includes features like

Deception in Different Areas

• Deception in Writing Style

• Deception in Website (Phishing)

• Deception in Blog Comment

Sunday, October 21, 12

Page 4: On Detecting Deception - icsi.berkeley.edusadia/talks/sadia_ghc_talk.pdf · On Detecting Deception Sadia Afroz ... •Previously used for detecting lying. •Includes features like

Deception in Writing Style

• Writing by changing regular writing style

Sunday, October 21, 12

Page 5: On Detecting Deception - icsi.berkeley.edusadia/talks/sadia_ghc_talk.pdf · On Detecting Deception Sadia Afroz ... •Previously used for detecting lying. •Includes features like

A Gay Girl In Damascus

A blog byAmina Arraf

A Syrian-American activistLives in Damascus

Facts about Amina:

Sunday, October 21, 12

Page 6: On Detecting Deception - icsi.berkeley.edusadia/talks/sadia_ghc_talk.pdf · On Detecting Deception Sadia Afroz ... •Previously used for detecting lying. •Includes features like

Sunday, October 21, 12

Page 7: On Detecting Deception - icsi.berkeley.edusadia/talks/sadia_ghc_talk.pdf · On Detecting Deception Sadia Afroz ... •Previously used for detecting lying. •Includes features like

Sunday, October 21, 12

Page 8: On Detecting Deception - icsi.berkeley.edusadia/talks/sadia_ghc_talk.pdf · On Detecting Deception Sadia Afroz ... •Previously used for detecting lying. •Includes features like

A Gay Girl In Damascus

Sunday, October 21, 12

Page 9: On Detecting Deception - icsi.berkeley.edusadia/talks/sadia_ghc_talk.pdf · On Detecting Deception Sadia Afroz ... •Previously used for detecting lying. •Includes features like

Fake picture (copied from Facebook)

A Gay Girl In Damascus

Sunday, October 21, 12

Page 10: On Detecting Deception - icsi.berkeley.edusadia/talks/sadia_ghc_talk.pdf · On Detecting Deception Sadia Afroz ... •Previously used for detecting lying. •Includes features like

Thomas MacMasterA 40-year old American male

Fake picture (copied from Facebook)

The real “Amina”

A Gay Girl In Damascus

Sunday, October 21, 12

Page 11: On Detecting Deception - icsi.berkeley.edusadia/talks/sadia_ghc_talk.pdf · On Detecting Deception Sadia Afroz ... •Previously used for detecting lying. •Includes features like

Sunday, October 21, 12

Page 12: On Detecting Deception - icsi.berkeley.edusadia/talks/sadia_ghc_talk.pdf · On Detecting Deception Sadia Afroz ... •Previously used for detecting lying. •Includes features like

Hoax

Sunday, October 21, 12

Page 13: On Detecting Deception - icsi.berkeley.edusadia/talks/sadia_ghc_talk.pdf · On Detecting Deception Sadia Afroz ... •Previously used for detecting lying. •Includes features like

Deception in Writing Style

• Goal:

• Distinguish regular writing from deceptive writings

Sunday, October 21, 12

Page 14: On Detecting Deception - icsi.berkeley.edusadia/talks/sadia_ghc_talk.pdf · On Detecting Deception Sadia Afroz ... •Previously used for detecting lying. •Includes features like

Approach

Sunday, October 21, 12

Page 15: On Detecting Deception - icsi.berkeley.edusadia/talks/sadia_ghc_talk.pdf · On Detecting Deception Sadia Afroz ... •Previously used for detecting lying. •Includes features like

Approach

Data Collection

Sunday, October 21, 12

Page 16: On Detecting Deception - icsi.berkeley.edusadia/talks/sadia_ghc_talk.pdf · On Detecting Deception Sadia Afroz ... •Previously used for detecting lying. •Includes features like

Approach

Data Collection Feature Extraction

Sunday, October 21, 12

Page 17: On Detecting Deception - icsi.berkeley.edusadia/talks/sadia_ghc_talk.pdf · On Detecting Deception Sadia Afroz ... •Previously used for detecting lying. •Includes features like

Approach

Data Collection ClassificationFeature Extraction

Sunday, October 21, 12

Page 18: On Detecting Deception - icsi.berkeley.edusadia/talks/sadia_ghc_talk.pdf · On Detecting Deception Sadia Afroz ... •Previously used for detecting lying. •Includes features like

Approach

Data Collection ClassificationFeature Extraction

Feature Ranking

Sunday, October 21, 12

Page 19: On Detecting Deception - icsi.berkeley.edusadia/talks/sadia_ghc_talk.pdf · On Detecting Deception Sadia Afroz ... •Previously used for detecting lying. •Includes features like

Data collection

• Short-term deception:–Extended-Brennan-

Greenstadt Corpus• Regular• Imitation

• Obfuscation –Hemingway-Faulkner

Imitation corpus• Regular• Imitation

• Long-term deception:-Thomas-Amina Hoax corpus• Regular• Deceptive

Sunday, October 21, 12

Page 20: On Detecting Deception - icsi.berkeley.edusadia/talks/sadia_ghc_talk.pdf · On Detecting Deception Sadia Afroz ... •Previously used for detecting lying. •Includes features like

Classification

•We used WEKA for machine learning.

•Classifier: –Experimented with several classifiers–Choose the best classifier for a feature set

•10-fold cross-validation–90% of data used for training–10% of data used for testing

Sunday, October 21, 12

Page 21: On Detecting Deception - icsi.berkeley.edusadia/talks/sadia_ghc_talk.pdf · On Detecting Deception Sadia Afroz ... •Previously used for detecting lying. •Includes features like

• We experimented with 3 feature sets:–Writeprints–Lying-detection features–9-features

Feature sets

Sunday, October 21, 12

Page 22: On Detecting Deception - icsi.berkeley.edusadia/talks/sadia_ghc_talk.pdf · On Detecting Deception Sadia Afroz ... •Previously used for detecting lying. •Includes features like

• We experimented with 3 feature sets:–Writeprints• 700+ features, SVM• Includes features like frequencies of word/character n-grams, parts-

of-speech n-grams.

–Lying-detection features–9-features

Feature sets

Sunday, October 21, 12

Page 23: On Detecting Deception - icsi.berkeley.edusadia/talks/sadia_ghc_talk.pdf · On Detecting Deception Sadia Afroz ... •Previously used for detecting lying. •Includes features like

• We experimented with 3 feature sets:–Writeprints• 700+ features, SVM

–Lying-detection features• 20 features, J48 decision tree• Previously used for detecting lying.• Includes features like rate of Adjectives and Adverbs, sentence complexity, frequency of self-reference.

–9-features

Feature sets

Sunday, October 21, 12

Page 24: On Detecting Deception - icsi.berkeley.edusadia/talks/sadia_ghc_talk.pdf · On Detecting Deception Sadia Afroz ... •Previously used for detecting lying. •Includes features like

• We experimented with 3 feature sets:–Writeprints• 700+ features, SVM

–Lying-detection features• 20 features, J48 decision tree

–9-features• 9 features, J48 decision tree•Used for authorship recognition• Includes features like readability index, number of characters, average syllables.

Feature sets

Sunday, October 21, 12

Page 25: On Detecting Deception - icsi.berkeley.edusadia/talks/sadia_ghc_talk.pdf · On Detecting Deception Sadia Afroz ... •Previously used for detecting lying. •Includes features like

Results

• Short-term deception:–Extended-Brennan-

Greenstadt Corpus– Regular: 98%– Imitation: 85%– Obfuscation: 89%

–Hemingway-Faulkner Imitation corpus• Regular: 86.2%• Imitation: 88.6%

• Long-term deception:-Thomas-Amina Hoax corpus• 14% was detected as deceptive• Regular authorship recognition shows

inconsistency in writing style.

Sunday, October 21, 12

Page 26: On Detecting Deception - icsi.berkeley.edusadia/talks/sadia_ghc_talk.pdf · On Detecting Deception Sadia Afroz ... •Previously used for detecting lying. •Includes features like

Deception in Website: Phishing

Alice uses online bank

Real bank

Sunday, October 21, 12

Page 27: On Detecting Deception - icsi.berkeley.edusadia/talks/sadia_ghc_talk.pdf · On Detecting Deception Sadia Afroz ... •Previously used for detecting lying. •Includes features like

Deception in Website: Phishing

Alice uses online bank

Real bank

URL

SSLBrowser Indicator

Sunday, October 21, 12

Page 28: On Detecting Deception - icsi.berkeley.edusadia/talks/sadia_ghc_talk.pdf · On Detecting Deception Sadia Afroz ... •Previously used for detecting lying. •Includes features like

Deception in Website: Phishing

Alice uses online bank

Real bank

URL

SSLBrowser Indicator

Fake bank

Sunday, October 21, 12

Page 29: On Detecting Deception - icsi.berkeley.edusadia/talks/sadia_ghc_talk.pdf · On Detecting Deception Sadia Afroz ... •Previously used for detecting lying. •Includes features like

Deception in Website: Phishing

Alice uses online bank

Real bank

Alice thinks everything that looks like her bank Is her bank!

URL

SSLBrowser Indicator

Fake bank

Sunday, October 21, 12

Page 30: On Detecting Deception - icsi.berkeley.edusadia/talks/sadia_ghc_talk.pdf · On Detecting Deception Sadia Afroz ... •Previously used for detecting lying. •Includes features like

Approach: PhishZoo

Sunday, October 21, 12

Page 31: On Detecting Deception - icsi.berkeley.edusadia/talks/sadia_ghc_talk.pdf · On Detecting Deception Sadia Afroz ... •Previously used for detecting lying. •Includes features like

Real site

Approach: PhishZoo

Sunday, October 21, 12

Page 32: On Detecting Deception - icsi.berkeley.edusadia/talks/sadia_ghc_talk.pdf · On Detecting Deception Sadia Afroz ... •Previously used for detecting lying. •Includes features like

Real siteExtractsvisual elements of the site

Approach: PhishZoo

Sunday, October 21, 12

Page 33: On Detecting Deception - icsi.berkeley.edusadia/talks/sadia_ghc_talk.pdf · On Detecting Deception Sadia Afroz ... •Previously used for detecting lying. •Includes features like

ImagesVisible text

Real siteExtractsvisual elements of the site

Approach: PhishZoo

Sunday, October 21, 12

Page 34: On Detecting Deception - icsi.berkeley.edusadia/talks/sadia_ghc_talk.pdf · On Detecting Deception Sadia Afroz ... •Previously used for detecting lying. •Includes features like

ImagesVisible text

Real siteExtractsvisual elements of the site

Approach: PhishZoo

Sunday, October 21, 12

Page 35: On Detecting Deception - icsi.berkeley.edusadia/talks/sadia_ghc_talk.pdf · On Detecting Deception Sadia Afroz ... •Previously used for detecting lying. •Includes features like

ImagesVisible text

Real siteExtractsvisual elements of the site

Approach: PhishZoo

Sunday, October 21, 12

Page 36: On Detecting Deception - icsi.berkeley.edusadia/talks/sadia_ghc_talk.pdf · On Detecting Deception Sadia Afroz ... •Previously used for detecting lying. •Includes features like

ImagesVisible text

Real siteExtractsvisual elements of the site

Approach: PhishZoo

Profile Stored

Sunday, October 21, 12

Page 37: On Detecting Deception - icsi.berkeley.edusadia/talks/sadia_ghc_talk.pdf · On Detecting Deception Sadia Afroz ... •Previously used for detecting lying. •Includes features like

ImagesVisible text

Real site

Fake site

Extractsvisual elements of the site

Approach: PhishZoo

Profile Stored

Sunday, October 21, 12

Page 38: On Detecting Deception - icsi.berkeley.edusadia/talks/sadia_ghc_talk.pdf · On Detecting Deception Sadia Afroz ... •Previously used for detecting lying. •Includes features like

Extracts visual elements of the site

ImagesVisible text

Real site

Fake site

Extractsvisual elements of the site

Approach: PhishZoo

Profile Stored

Sunday, October 21, 12

Page 39: On Detecting Deception - icsi.berkeley.edusadia/talks/sadia_ghc_talk.pdf · On Detecting Deception Sadia Afroz ... •Previously used for detecting lying. •Includes features like

Extracts visual elements of the site

ImagesVisible text

Real site

Fake site

Extractsvisual elements of the site

Approach: PhishZoo

Profile Stored

Sunday, October 21, 12

Page 40: On Detecting Deception - icsi.berkeley.edusadia/talks/sadia_ghc_talk.pdf · On Detecting Deception Sadia Afroz ... •Previously used for detecting lying. •Includes features like

Extracts visual elements of the site

ImagesVisible text

Real site

Fake site

Extractsvisual elements of the site

ImagesVisible text

Approach: PhishZoo

Profile Stored

Sunday, October 21, 12

Page 41: On Detecting Deception - icsi.berkeley.edusadia/talks/sadia_ghc_talk.pdf · On Detecting Deception Sadia Afroz ... •Previously used for detecting lying. •Includes features like

Extracts visual elements of the site

ImagesVisible text

Real site

Fake site

Extractsvisual elements of the site

ImagesVisible text

Approach: PhishZoo

Profile Stored

Sunday, October 21, 12

Page 42: On Detecting Deception - icsi.berkeley.edusadia/talks/sadia_ghc_talk.pdf · On Detecting Deception Sadia Afroz ... •Previously used for detecting lying. •Includes features like

Extracts visual elements of the site

ImagesVisible text

Real site

Fake site

Extractsvisual elements of the site

ImagesVisible text

Approach: PhishZoo

Profile Stored

Sunday, October 21, 12

Page 43: On Detecting Deception - icsi.berkeley.edusadia/talks/sadia_ghc_talk.pdf · On Detecting Deception Sadia Afroz ... •Previously used for detecting lying. •Includes features like

Extracts visual elements of the site

ImagesVisible text

Real site

Fake site

Extractsvisual elements of the site

ImagesVisible text

Visual components match but the url, ssl don’t match

Approach: PhishZoo

Profile Stored

Sunday, October 21, 12

Page 44: On Detecting Deception - icsi.berkeley.edusadia/talks/sadia_ghc_talk.pdf · On Detecting Deception Sadia Afroz ... •Previously used for detecting lying. •Includes features like

Extracts visual elements of the site

ImagesVisible text

Real site

Fake site

Extractsvisual elements of the site

ImagesVisible text

Visual components match but the url, ssl don’t match

Approach: PhishZoo

Profile Stored

Sunday, October 21, 12

Page 45: On Detecting Deception - icsi.berkeley.edusadia/talks/sadia_ghc_talk.pdf · On Detecting Deception Sadia Afroz ... •Previously used for detecting lying. •Includes features like

Extracts visual elements of the site

ImagesVisible text

Real site

Fake site

Extractsvisual elements of the site

ImagesVisible text

Visual components match but the url, ssl don’t match

Phishing Alert

Approach: PhishZoo

Profile Stored

Sunday, October 21, 12

Page 46: On Detecting Deception - icsi.berkeley.edusadia/talks/sadia_ghc_talk.pdf · On Detecting Deception Sadia Afroz ... •Previously used for detecting lying. •Includes features like

Result

21.5

70.3

82.7

96.4

81.1

97.6

90.2

1 0.5 2.5 1.4

30.3

18.7

0.5

0

20

40

60

80

100

120

HTML Visibletext

inHTML

Images Images&

visibletexts

Screenshots Keywords Images&

Keywords

Accuracy

FalsePosiMve

Sunday, October 21, 12

Page 47: On Detecting Deception - icsi.berkeley.edusadia/talks/sadia_ghc_talk.pdf · On Detecting Deception Sadia Afroz ... •Previously used for detecting lying. •Includes features like

Result

21.5

70.3

82.7

96.4

81.1

97.6

90.2

1 0.5 2.5 1.4

30.3

18.7

0.5

0

20

40

60

80

100

120

HTML Visibletext

inHTML

Images Images&

visibletexts

Screenshots Keywords Images&

Keywords

Accuracy

FalsePosiMve

96.4% accurate in detecting phishing

Sunday, October 21, 12

Page 48: On Detecting Deception - icsi.berkeley.edusadia/talks/sadia_ghc_talk.pdf · On Detecting Deception Sadia Afroz ... •Previously used for detecting lying. •Includes features like

Future work:Deception in Blog

Comment

Sunday, October 21, 12

Page 49: On Detecting Deception - icsi.berkeley.edusadia/talks/sadia_ghc_talk.pdf · On Detecting Deception Sadia Afroz ... •Previously used for detecting lying. •Includes features like

Approach

• Spammers post same thing repeatedly.

• Use compression ratio (LZMA)

• Classifier: Latent logistic regression

Sunday, October 21, 12

Page 50: On Detecting Deception - icsi.berkeley.edusadia/talks/sadia_ghc_talk.pdf · On Detecting Deception Sadia Afroz ... •Previously used for detecting lying. •Includes features like

Result

Sunday, October 21, 12

Page 51: On Detecting Deception - icsi.berkeley.edusadia/talks/sadia_ghc_talk.pdf · On Detecting Deception Sadia Afroz ... •Previously used for detecting lying. •Includes features like

But spammers are smart!• There are tools for spamming: Xrumer, SEnuke,

Ultimate WordPress Comment Submitter (UWCS)

• That automatically

• create new accounts

• Use proxy

• Copy relevant words

Sunday, October 21, 12

Page 52: On Detecting Deception - icsi.berkeley.edusadia/talks/sadia_ghc_talk.pdf · On Detecting Deception Sadia Afroz ... •Previously used for detecting lying. •Includes features like

Summary• Deception in Writing Style:

• Distinguish regular writing from deceptive writings

• Deception in Website (Phishing)

• Detect website imitation

• Deception in Blog Comment

• Detect spam comments

Sunday, October 21, 12

Page 53: On Detecting Deception - icsi.berkeley.edusadia/talks/sadia_ghc_talk.pdf · On Detecting Deception Sadia Afroz ... •Previously used for detecting lying. •Includes features like

Thanks!

• Sadia Afroz: [email protected]• Rachel Greenstadt: [email protected]• Michael Brennan: [email protected]• Ariel Stolerman: [email protected]• Andrew McDonald: [email protected]• Aylin Caliskan: [email protected]

• Privacy, Security And Automation Lab (https://psal.cs.drexel.edu)• Secure Computing Research for User Benefit (https://scrub.cs.berkeley.edu)

Sunday, October 21, 12