Upload
dato-inc
View
28
Download
2
Embed Size (px)
Citation preview
Collective Spammer Detection in Evolving Multi-Relational Social Networks(To appear at KDD’15)
Shobeir Fakhraei*University of Maryland*Research performed while on internship at if(we) (Tagged Inc.) through Graphlab program
In collaboration with:James Foulds (UCSC, Currently UCSD)Madhusudana Shashanka (if(we) Inc., Currently Niara Inc.) Lise Getoor (UCSC)
2
Spam in Social Networks• Recent study by Nexgate in 2013:• Spam grew by more than 300% in half a year• 1 in 200 of social messages are spam• 5% of all social apps are spammy
• Social networks:• Spammers have more ways to interact with users
e.g, Comments on photos, send message, and send not verbal signals
• They can split content in several messages and forms• More available info about users on their profiles!
3
Our Approach• Lets look at the graph:• Multi-relational social
network Links = actions at time t (e.g. profile view, message, or poke)
• Time-stamped Links
4
Tagged.com• Tagged.com is a social network founded in 2004 which
focuses on connecting people through different social features and games.
• It has over 300 million registered members
• Data sample for experiments (on a laptop):• 5.6 Million users (%3.9 Labeled Spammers)• 912 Million Links
5
Graph Structure
• Using Graphlab Create• Graph Analytics Toolkit to Generate Features:
• In each relation graph we compute:PageRank, Total Degree, In Degree, Out Degree, k-Core, Graph Coloring, Connected Components, Triangle Count
• Gradient Boosted Trees:• Classification (Spam or Legitimate)
6
Graph Structure
Experiments AU-PR AU-ROC
1 Relation, k Feature type 0.187 ± 0.004 0.803 ±0.001
n Relations, 1 Feature Type 0.285 ± 0.002 0.809 ± 0.001
n Relations, k Features types 0.328 ± 0.003 0.817 ± 0.001
7
Sequence of Actions• Sequential k-gram Features:
Short sequence segment of k consecutive actions, to capture the order of events
User1 Actions: Message, Profile_view, Message, Friend_Request, ….
8
Sequence of Actions• Mixture of Markov Models (MMM):
Also called chain-augmented or tree-augmented naive Bayes model to capture longer sequences
9
Sequence of Actions
Experiments AU-PR AU-ROC
Bigram Features 0.471 ± 0.004 0.859 ± 0.001
MMM 0.246 ± 0.009 0.821 ± 0.003
10
ResultsPrecision-Recall ROC
We can classify 70% of the spammer that need manual labeling with about 90% accuracy
11
Deployment and Example Runtimes
• We can: • Run the model on short intervals, with new snapshots of the network• Update some of the features with events.
• Example runtimes with Graphlab CreateTM on a Macbook Pro:• 5.6 million nodes and 350 million links• PageRank: 6.25 minutes• Triangle counting: 17.98 minutes• k-core: 14.3 minutes
12
Refining the abuse reporting systems
• Abuse report systems are very noisy.• People have different standards.• Spammers report random people to increase noise.• Personal gain in social games.
• Goal is to clean up the system using• Reporters previous history• Looking at all the reports collectively
13
Probabilistic Soft Logic (PSL)
• Probabilistic Soft Logic (PSL) Declarative language based on first-order logic to express collective probabilistic inference problems.
• Example:
14
Results of Classification Using Reports
Experiments AU-PR AU-ROC
Reports Only 0.674 ± 0.008 0.611 ± 0.007
Reports & Credibility 0.869 ± 0.006 0.862 ± 0.004
Reports & Credibility & Collective Reasoning 0.884 ± 0.005 0.873 ± 0.004
15
Conclusions• Strong signals in multi-relational time-stamped graph
• Graph Structure:• Multi-rationality is more important than extracting more features from one relations
• Sequence:• Bigrams features of actions capture most of the patterns
• Reports:• Incorporating credibility of reporting users and collective reasoning significantly
improves the signal
• Fast experiments iterations with Graphlab CreateTM facilitates finding these pattern
16
Acknowledgements • Co-authors of the paper:
James Foulds (UCSC)Madhusudana Shashanka (if(we) Inc. -Currently Niara Inc.-)Lise Getoor (UCSC)
• If(we) Inc. (Formerly Tagged Inc.):Johann Schleier-Smith, Karl Dawson, Dai Li, Stuart Robinson, Vinit Garg, and Simon Hill
• Dato (Formerly Graphlab):Danny Bickson, Brian Kent, Srikrishna Sridhar, Rajat Arya, Shawn Scully, and Alice Zheng
• The paper is available online, and code and part of the data will be released soon:http://www.cs.umd.edu/~shobeir
17
Acknowledgements • Co-authors of the paper:
James Foulds (UCSC)Madhusudana Shashanka (if(we) Inc. -Currently Niara Inc.-)Lise Getoor (UCSC)
• If(we) Inc. (Formerly Tagged Inc.):Johann Schleier-Smith, Karl Dawson, Dai Li, Stuart Robinson, Vinit Garg, and Simon Hill
• Dato (Formerly Graphlab):Danny Bickson, Brian Kent, Srikrishna Sridhar, Rajat Arya, Shawn Scully, and Alice Zheng
• The paper is available online, and code and part of the data will be released soon:http://www.cs.umd.edu/~shobeir
Thank you!