17
Collective Spammer Detection in Evolving Multi- Relational Social Networks (To appear at KDD’15) Shobeir Fakhraei * University of Maryland *Research performed while on internship at if(we) (Tagged Inc.) through Graphlab program In collaboration with: James Foulds (UCSC, Currently UCSD) Madhusudana Shashanka (if(we) Inc., Currently Niara Inc.) Lise Getoor (UCSC)

Collective Spammer Detection in Evolving Multi-Relational Social Networks

Embed Size (px)

Citation preview

Collective Spammer Detection in Evolving Multi-Relational Social Networks(To appear at KDD’15)

Shobeir Fakhraei*University of Maryland*Research performed while on internship at if(we) (Tagged Inc.) through Graphlab program

In collaboration with:James Foulds (UCSC, Currently UCSD)Madhusudana Shashanka (if(we) Inc., Currently Niara Inc.) Lise Getoor (UCSC)

2

Spam in Social Networks• Recent study by Nexgate in 2013:• Spam grew by more than 300% in half a year• 1 in 200 of social messages are spam• 5% of all social apps are spammy

• Social networks:• Spammers have more ways to interact with users

e.g, Comments on photos, send message, and send not verbal signals

• They can split content in several messages and forms• More available info about users on their profiles!

3

Our Approach• Lets look at the graph:• Multi-relational social

network Links = actions at time t (e.g. profile view, message, or poke)

• Time-stamped Links

4

Tagged.com• Tagged.com is a social network founded in 2004 which

focuses on connecting people through different social features and games.

• It has over 300 million registered members

• Data sample for experiments (on a laptop):• 5.6 Million users (%3.9 Labeled Spammers)• 912 Million Links

5

Graph Structure

• Using Graphlab Create• Graph Analytics Toolkit to Generate Features:

• In each relation graph we compute:PageRank, Total Degree, In Degree, Out Degree, k-Core, Graph Coloring, Connected Components, Triangle Count

• Gradient Boosted Trees:• Classification (Spam or Legitimate)

6

Graph Structure

Experiments AU-PR AU-ROC

1 Relation, k Feature type 0.187 ± 0.004 0.803 ±0.001

n Relations, 1 Feature Type 0.285 ± 0.002 0.809 ± 0.001

n Relations, k Features types 0.328 ± 0.003 0.817 ± 0.001

7

Sequence of Actions• Sequential k-gram Features:

Short sequence segment of k consecutive actions, to capture the order of events

User1 Actions: Message, Profile_view, Message, Friend_Request, ….

8

Sequence of Actions• Mixture of Markov Models (MMM):

Also called chain-augmented or tree-augmented naive Bayes model to capture longer sequences

9

Sequence of Actions

Experiments AU-PR AU-ROC

Bigram Features 0.471 ± 0.004 0.859 ± 0.001

MMM 0.246 ± 0.009 0.821 ± 0.003

10

ResultsPrecision-Recall ROC

We can classify 70% of the spammer that need manual labeling with about 90% accuracy

11

Deployment and Example Runtimes

• We can: • Run the model on short intervals, with new snapshots of the network• Update some of the features with events.

• Example runtimes with Graphlab CreateTM on a Macbook Pro:• 5.6 million nodes and 350 million links• PageRank: 6.25 minutes• Triangle counting: 17.98 minutes• k-core: 14.3 minutes

12

Refining the abuse reporting systems

• Abuse report systems are very noisy.• People have different standards.• Spammers report random people to increase noise.• Personal gain in social games.

• Goal is to clean up the system using• Reporters previous history• Looking at all the reports collectively

13

Probabilistic Soft Logic (PSL)

• Probabilistic Soft Logic (PSL) Declarative language based on first-order logic to express collective probabilistic inference problems.

• Example:

14

Results of Classification Using Reports

Experiments AU-PR AU-ROC

Reports Only 0.674 ± 0.008 0.611 ± 0.007

Reports & Credibility 0.869 ± 0.006 0.862 ± 0.004

Reports & Credibility & Collective Reasoning 0.884 ± 0.005 0.873 ± 0.004

15

Conclusions• Strong signals in multi-relational time-stamped graph

• Graph Structure:• Multi-rationality is more important than extracting more features from one relations

• Sequence:• Bigrams features of actions capture most of the patterns

• Reports:• Incorporating credibility of reporting users and collective reasoning significantly

improves the signal

• Fast experiments iterations with Graphlab CreateTM facilitates finding these pattern

16

Acknowledgements • Co-authors of the paper:

James Foulds (UCSC)Madhusudana Shashanka (if(we) Inc. -Currently Niara Inc.-)Lise Getoor (UCSC)

• If(we) Inc. (Formerly Tagged Inc.):Johann Schleier-Smith, Karl Dawson, Dai Li, Stuart Robinson, Vinit Garg, and Simon Hill

• Dato (Formerly Graphlab):Danny Bickson, Brian Kent, Srikrishna Sridhar, Rajat Arya, Shawn Scully, and Alice Zheng

• The paper is available online, and code and part of the data will be released soon:http://www.cs.umd.edu/~shobeir

17

Acknowledgements • Co-authors of the paper:

James Foulds (UCSC)Madhusudana Shashanka (if(we) Inc. -Currently Niara Inc.-)Lise Getoor (UCSC)

• If(we) Inc. (Formerly Tagged Inc.):Johann Schleier-Smith, Karl Dawson, Dai Li, Stuart Robinson, Vinit Garg, and Simon Hill

• Dato (Formerly Graphlab):Danny Bickson, Brian Kent, Srikrishna Sridhar, Rajat Arya, Shawn Scully, and Alice Zheng

• The paper is available online, and code and part of the data will be released soon:http://www.cs.umd.edu/~shobeir

Thank you!