Upload
jeremiah-onaolapo
View
57
Download
0
Embed Size (px)
Citation preview
Flipping 419 Cybercrime Scams: Targeting the Weak and the Vulnerable
Gibson Mba* Jeremiah Onaolapo#
Gianluca Stringhini# Lorenzo Cavallaro*
*Royal Holloway, University of London#University College London
WWW2017 CyberSafety Workshop // Perth, Australia // April 4, 2017
419 scams
● Advance Fee Fraud
● “419” derived from Nigeria’s Criminal Law against such scams
● Been around for some time
● Most previous work focus on cybercrime targeting the US, EU, Asia1,2,3
● What about cybercrime targeting Africa?
○ Little attention
○ Hence our study
3
1R. Anderson et al., Measuring the Cost of Cybercrime, WEIS, 2012.2N. Christin et al., Dissecting one click frauds, CCS, 2010.3B. Stone-Gross et al., Your botnet is my botnet: Analysis of a botnet takeover, CCS, 2009.
Contributions
● Highlight a unique form of scam targeting vulnerable Nigerian students,
secondary school leavers, and unemployed persons, among others
● Provide insight into common themes around which fraudsters build their
scam schemes
○ We rely on Machine Learning (ML) techniques to achieve this
○ Themes -- Academic, Employment, Spirituality, Dating, Other
4
Automatic data classification
Dataset description
Ground truth extraction Validation ClusteringOur roadmap
Data sources
Our goal -- Collect and analyze data to understand scam schemes
● Topix.com’s Nigeria forum
● 2005 -- posts on news and current affairs
● 2012 onwards -- scam posts show up and grow
● Sheds light on 419 scams perpetrated against Nigerians
● Hosts posts promoting different types of scam services
○ Also contact information (mostly phone numbers) to reach the fraudsters
6
More data
Supplementary data sourced from
● http://www.adsafrica.com
● http://www.123nigeria.com/
● http://forumng.com/
7
Data collection (Jan. 2012 -- Nov. 2013)
Total posts 711,861
Posts with phone numbers 598,572
Total unique posts 589,956
Total unique authors 37,948
Distinct locations 613
Distinct phone numbers 12,425
8
Growth of posts
Increase● 218 posts in Jan. 2012● 142,344 posts in Sep. 2013
Sharp drop
● After Sep. 2013● Corresponds to the time lecturers
called off six-month strike● Students resort to scams
because they have nothing else to do?
10
Our goal -- Determine if a post is a scam or not
● We selected 663 posts from the total of 711,861 posts
○ Random sampling without replacement
○ Confidence level 95%
○ Error rate 5%
● We augmented the data with additional 372 random non-scam samples
○ To address the “Imbalanced Dataset” problem1
○ I.e., we used the over-sampling approach to balance the dataset
Dataset preparation
121N. V. Chawla et al., SMOTE: Synthetic Minority Over-sampling Technique, JAIR, 2002.
Our goal -- Pick scam/ not-scam posts to build ground truth data for our classifier
● Any post offering to give assistance to any candidate to gain admission into
any institution of learning is a scam
● Any post offering any form of fun services e.g., sex for money is a fraud
● Any post offering any form of assistance for jobs or employment is a scam
● Offers of spiritual/ religious assistance e.g., prayers, illuminati membership,
magical powers, healing are scams
● And more heuristics
Heuristics to build ground truth
13
Automatic data classification
Our goal -- Identify scam posts on the forum
Obstacle -- Too many posts, we can’t identify all crime posts manually
Solution -- Rely on supervised ML techniques
● Binary classification task {is_scam, not_scam}
● Trained Support Vector Machine (SVM1) using ground truth dataset
● Evaluation -- 5-fold cross validation (Accuracy 95.17%)
151T. Joachims, Text categorization with Support Vector Machines: Learning with many relevant features, ECML, 1998.2G. Salton and C. Buckley, Term-weighting approaches in automatic text retrieval, IPM Journal, 1988.
TF-IDF2 SVMPostsFeatures
{is_scam, not_scam}
Results of automatic data classification
● Applied SVM model on entire dataset
○ 711,861 minus 1,035 posts used for training
● 679,222 (95.55% of the posts) -- YES (in other words, is_scam)
● 31,604 (4.45% of the posts) -- NO (in other words, not_scam)
● Conclusion -- The forum is a crime hub used by scammers to advertise
schemes to deceive and exploit their victims
16
5-fold validationMetric SVM
Accuracy 95.17%
Precision 96.54%
Recall 95.80%
Specificity 94.14%
F1 96.16%
Error 4.83%
18
Validation
Automatic data classification
Our goal -- Identify the theme of each scam post
Obstacle -- Too many posts (679,222)
Solution -- Rely on supervised ML techniques
● We manually checked 655 scam posts (training set) and identified five themes
● Multi-class classification task {Academic, Employment, Spirituality, Dating, Other}
20
TF-IDF SVMPostsFeatures
{A, E, S, D, O}
Results of multi-class classification
Class Posts % Scam
Academic 464,069 68.32%
Employment 129,811 19.11%
Other 48,228 7.10%
Dating 22,897 3.37%
Spirituality 14,217 2.09%
Total 679,222 100.00%
● Academic, Employment scams are quite common
● Traceable to dwindling academic performance ○ As reported by examination bodies
● Unemployment is also an issue○ 23.9% as of 2011, according to the
Nigerian Bureau of Statistics
● Themes are very important○ Key contribution
21
Confusion matrix (5-fold validation)
Academic Employment Spirituality Other Dating TotalCorrect
predictions
Academic 413 7 1 9 1 431 95.82%
Employment 2 136 0 2 0 140 97.14%
Spirituality 0 0 16 1 0 17 94.12%
Other 1 5 1 23 5 35 65.71%
Dating 1 0 0 0 31 32 96.88%
Total 417 148 18 35 37 655
23
Clustering
Goal -- Identify clusters of entities in the dataset, for instance,
groups of related scammers
Why? Could indicate coordination among fraudsters/ existence of criminal gangs
● We selected 16,194 posts from 679,222 crime posts
○ Random sampling without replacement
○ Confidence level 99%
○ Error rate 1%
● Density-Based Spatial Clustering of Applications with Noise (DBSCAN)1
251M. Ester et al., A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise, KDD, 1996.
Visualization of clusters
● DBSCAN computed 197 clusters from our data
● We fed some clusters through Gephi1
26
Case studySame topic, multiple phone numbers● Indicates coordination of activities among
scammers
● Could also be because some fraudsters trying to
copy post topics of other scammers
1M. Bastian et al., Gephi: An Open Source Software for Exploring and Manipulating Networks, ICWSM, 2009.
27
Case study -- a cluster of clusters● An elaborate scamming scheme● Emphasis on cluster
○ Single phone number node○ Multiple topics
● 2011 -- SIM registration policy by Nigerian Communications Commission (NCC)○ Registration involves recording some personal data and biometric information about subscribers
○ Key objective was to assist law enforcement agencies during the criminal investigations
● Did SIM registration help to reduce cybercrime on the forum?○ Posts containing phone numbers actually increased after SIM registration was introduced
○ Overall number of posts on the forum also increased (i.e., posting activity increased)
● Did SIM registration encourage the growth of criminal activity on the forum?○ No. The absence of cybercrime law until 2014, and weak investigation/ prosecution capabilities on the part of
law enforcement agencies are more likely reasons
○ Telecommunication firms were also not totally compliant with the SIM registration policy
SIM card registration: A countermeasure?
29
Takeaways
● Despite the massive coverage of 419 scams, some types are still understudied
● We highlight a unique form of scam targeting specific Nigerian demographics
● Law enforcement agencies may find the cluster analysis approach useful
○ To identify and takedown key nodes in sophisticated scam schemes
● The SIM card registration policy is not sufficient in tackling online scams
involving phone numbers
● Future work could involve studying whether certain demographics are more
susceptible to these types of scams we highlighted
30
Questions?
Call for papers
Submission link: https://scienceinpublic.org/science-in-public-2017/
Panel: Phishing and Pharming Passwords: What Are the Real World Effects of Information Theft on People?
Format: Short "paper proposals" (Word document, 300 words maximum)
Venue: Sheffield, UK
Submission deadline: April 18, 2017
Thanks!
31
Contact info
Email: j.onaolapo [*AT*] cs.ucl.ac.uk
Twitter: @jerryola