Upload
anupama-aggarwal
View
87
Download
2
Embed Size (px)
DESCRIPTION
In Foursquare, one of the currently most popular online location based social networking sites (LBSNs), users may not only check-in at specific venues but also post comments (or tips), sharing their opinions and previous experiences at the corresponding physical places. Foursquare tips, which are visible to everyone, provide venue owners with valuable user feedback besides helping other users to make an opinion about the specific venue. However, they have been the target of spamming activity by users who exploit this feature to spread tips with unrelated content. In this paper, we present what, to our knowledge, is the first effort to identify and analyze different patterns of tip spamming activity in Foursquare, with the goal of developing automatic tools to detect users who post spam tips - tip spammers. A manual investigation of a real dataset collected from Foursquare led us to identify four categories of spamming behavior, viz. Advertising/Spam, Self-promotion, Abusive and Malicious. We then applied machine learning techniques, jointly with a selected set of user, social and tip's content features associated with each user, to develop automatic detection tools. Our experimental results indicate that we are able to not only correctly distinguish legitimate users from tip spammers with high accuracy (89.76%) but also correctly identify a large fraction (at least 78.88%) of spammers in each identified category.
Citation preview
Anupama Aggarwal¶, Prof P. Kumaraguru “PK”¶ , Prof J. Almeida*
1
Detection of Spam Tipping Behaviour on
Foursquare
¶ Indraprastha Institute of Information Technology (IIIT-Delhi, India)* Universidade Federal de Minas Gerais (UFMG, Brazil)
2
‣ Location Based Social Network
‣ 33 Million Users *
‣ 3.5 Billion checkins *
‣ 31% of mobile social media users use Foursquare *
* As of January 2013
Foursquare 101
Foursquare 101Your LastCheckin
Venue
Fri
ends
Act
ivit
y
Tip : Suggested Activity for a Venue
Friends Suggestions
Venue Suggestions
Tip can be Liked or Saved
LocationSharing
OSN
4
Spam Tips
‣ Tips unrelated to Venue
Advertising / Marketing
Scam / Phishing
5
Spam according to
Foursquare ToS‣ Tips with links to websites selling software, realtor contact
info, a listing for your business, or other promotion
‣ Tips with inappropriate language or negativity directed at another person
‣ Unauthorized or unsolicited advertising, junk
6
‣ Characterizing irregular user behaviour
‣ We observed different categories of spam users
‣ We characterize features distinguishing these spam users
‣ Automatic detection of spammers
‣ Distinguish between spam and legitimate Foursquare users
‣ Cluster spam users into different categories according to their behaviour
Contributions
7
Data Crawling
2,400,594 tips
613,298 users
Observed Categories of Spam Users
‣ Marketing : These users post tips to promote and advertise a specific product/ brand / venue / external URL
‣ Malicious : Such Foursquare users post external URLs in Tips which direct to spam / phishing / malware websites
‣ Abusive / Derogatory: These users try to deface or bad-mouth another person
‣ Self Promotion: These users try to draw attention to themselves
8
9
Ground Truth DataAnnotation Portal
2,000 Legitimate users
1,900 Spammers
10
‣ User Attributes
‣ Properties of the Foursquare user profile and his checkins
‣ Social Attributes
‣ Friends network of the Foursquare user under inspection
‣ Content Attributes
‣ Details about Tips posted by the Foursquare user
Features used todetect Spammers
11
Features usedCategory χ2 rank Feature
UserAttributes
1 Number of Tips
UserAttributes
3 Ratio of Check-ins and Tips
UserAttributes
4 Number of Check-insUser
Attributes 5 Number of BadgesUser
Attributes11 Number of Mayorships
UserAttributes
12 Ratio of Check-ins and Badges
UserAttributes
15 Number of Photos posted
Social Attributes
6 Number of Friends
ContentAttributes
2 Similarity score of Tips
ContentAttributes
7 Number of URLs posted
ContentAttributes
8 Average number of words in TipsContent
Attributes 9 Average number of characters in TipsContent
Attributes10 Ratio of number of likes and number of Tips
ContentAttributes
13 Average number of spam words in Tips
ContentAttributes
14 Average number of phone-numbers posted in Tips
12
‣ Spammers post same/similar Tips on multiple venues
‣ A large fraction of spam Tips contain URLs
‣ Spam Tips may also have phone numbers
‣ Legitimate users have more Friends
‣ Spammers have very few Friends but large number of Tips
Few Observations
Relation b/w Tips and Checkins
Check-ins
TipsIrregular User Behaviour
14
Tips Distribution
Legitimate users Spammers
15
Classification Results
ClassificationAlgorithm
Precision(Spam)
Precision(Safe)
Recall(Spam)
Recall(Safe)
Accuracy
KNN 83.2% 86.6% 86.3% 83.5% 84.89%
DecisionTree
88.1% 89.2% 88.3% 85.8% 89.53%
RandomForest
89.3% 90.2% 88.3% 90.3% 89.76%
16
‣ Expectation-Maximization (EM) clustering
‣ Spammers Categories -
‣ Advertising / Marketing
‣ Self Promotion
‣ Abusive
‣ Malicious
Detection of Spam Classes
17
‣ Clustering Accuracy for spammer categories -
Detection of Spam Classes
Advertising 88.23%
Self-Promotion 87.23%
Abusive 78.88%
Malicious 0%
18
‣ Analyzed spammers behaviour on Foursquare
‣ We obtained an accuracy of 89.76% with Random Forest classifier to distinguish spammers from legitimate users
‣ We classified the spammers into four broad categories
‣ We were able to to detect users belonging to Advertising, Self-promotion and Abusive categories with an accuracy of 88.23%, 87.23% and 78.88%
Conclusion
19
‣ Refine our methodology by use of other classification algorithms
‣ Use multiclass classification to detect users in any of the spam categories
‣ Correlation of content and the URLs posted by different users can help us in identifying several spam campaigns on Foursquare
Future Work
20
Thank You!
Questions ?