Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
CLASSIFICATION OF PHISHING SCAM IN WEBSITE
USING VOWPAL WABBIT ALGORITHM
IZZATY SYAHIRA BINTI KAMARUDDIN
BACHELOR OF COMPUTER SCIENCE
(COMPUTER NETWORK SECURITY) WITH HONOURS
FACULTY INFORMATICS AND COMPUTING
UNIVERSITI SULTAN ZAINAL ABIDIN
August 2020
DECLARATION
I hereby declare that this report is based on my original work except for quotations and citation,
which have been acknowledged. I also declare that it has been previously or concurrently
submitted for any other degree at University Sultan Zainal Abidin or other institutions.
Signature :…………………….
Name :…………………….
Date :…………………….
APPROVAL
This confirm that the research conducted and the writing of this report was under my supervisor.
Signature :………………………………………………..
Supervisor : Sir Ahmad Faisal Amri bin Abidin @ Bharun
Date :………………………………………………..
DEDICATION
In the name of Allah SWT, the Most Gracious and the Most Merciful, all praise is only for Him.
I would like to express my deepest appreciation to all who provided me the courage and possibility
to complete this report. A special gratitude goes to my supervisor, Sir Ahmad Faisal Amri for
guiding me to do my final year project.
I take this opportunity to thank you my parents and my family for giving moral support and
encouragement whenever I feel like give up. I also give special thanks to all lecturers of Faculty
of Informatics and Computing for their attentions, guidance and advice during my final year
project period. Sincere thanks to my fellow friends for their help in helping me in my final year
project.
May Allah S.W.T. bless all effort for completing this final year project.
Thank you.
ABSTRACT
In this cyber-world, phishing is one of the major problems that leads to financial losses for
both industries and individuals. With the growth on the internet today, attackers can easily launch
targeted phishing attacks without the victims notice they have been deceived. Phishing is a kind
of attack which is attackers use spoofed email and fraudulent web sites to trick people without
their notice. Phishing websites looks very similar in appearance to its corresponding legitimate
website to deceive users into believing that they are browsing in the correct website. The attackers
send a malicious links or attachments through phishing emails that can perform various functions,
including steal the login credentials or account information of the victim. These emails can harm
victims through of money loss and identify theft. This paper main goal is to investigate the
potential of Vowpal Wabbit Algorithm in classify the phishing websites in order to protect users
from being hacked or deceived with stealing the personal access and information. Vowpal Wabbit
Algorithm is a fast, parallel machine learning framework that was developed for distributed
computing and it can help to prevent the attacker to do interruption. This project also will be carried
out by classifying data from computer in Weka analyzing tool.
ABSTRAK
Di dunia yang serba moden ini, phishing adalah salah satu masalah utama yang membawa
kepada kerugian kewangan bagi kedua-dua industri dan individu. Dengan berkembangnya internet
hari ini, penyerang dengan mudah boleh melancarkan serangan phishing yang disasarkan tanpa
mangsa mengetahui yang mereka telah kena tipu. Phishing adalah sejenis serangan yang mana
penyerang menggunakan e-mel palsu dan laman web palsu untuk menipu orang tanpa diketahui
oleh mereka. Laman web phishing kelihatan sangat mirip dengan penampilan laman web yang sah
untuk menipu pengguna untuk mempercayai bahawa mereka sedang melayari laman web yang
betul. Penyerang menghantar pautan atau lampiran yang berniat jahat melalui e-mel phishing yang
boleh melakukan pelbagai fungsi, termasuk mencuri bukti kelayakan log masuk atau maklumat
akaun mangsa. E-mel ini boleh merosakkan mangsa melalui kehilangan wang dan mengenal pasti
kecurian. Matlamat utama kertas ini adalah untuk menyiasat potensi Vowpal Wabbit algorithm
dalam mengklasifikasikan laman web phishing untuk melindungi pengguna daripada digodam atau
ditipu dengan mencuri akses dan maklumat peribadi. Vowpal Wabbit algorithm adalah rangka
kerja pembelajaran mesin yang cepat dan sejajar yang dibangunkan untuk pengkomputeran yang
diedarkan dan dapat membantu mencegah penyerang melakukan gangguan. Projek ini juga akan
dilakukan dengan mengklasifikasikan data dari komputer dalam alat analisis Weka.
TABLE OF CONTENTS
Title Page No
Declaration
Approval
Dedication
Abstract
Abstrak
Table Of Contents
Diagram Lists
CHAPTER 1: INTRODUCTION
1.1 Background
1.2 Problem statement
1.3 Objective
1.4 Scope
1.5 Limitation of work
1.6 Thesis Organization
CHAPTER 2: LITERATURE REVIEW
2.1 Introduction
2.2 Phishing
2.2.1 Definition of Phishing
2.2.2 How Phishing works ?
2.2.3 Types of Phishing
2.3 Scam
2.3.1 Definition of Scam
2.3.2 Types of Scam
2.4 Vowpal Wabbit Algorithm
2.5 Email Filtering Techniques
2.6 Comparison between Methods
2.7 Summary
CHAPTER 3: Methodology
3.1 Introduction
3.2 Specification and System Requirements
3.2.1 Determine Requirements
3.2.2 Hardware
3.2.3 Software
3.3 Algorithm
3.3.1 Vowpal Wabbit
3.4 General Framework
3.5 Summary
REFERENCES
CHAPTER 1
INTRODUCTION
1.1 Background
In this modernized world, security issue plays an important role in technology especially
electronics communication on the internet and it can launch targeted the phishing attacks. Phishing
is a criminal technique employing both social engineering and technical subterfuge to steal
consumer’s personal identity data and financial account credential [1]. Phishing also one of the
different types of fraud that committed today. In criminal law, fraud is defined as a deliberate
deception made of the sole aim of personal gains or for smearing an individual’s image [2].
Phishing websites are fake web pages that are creates by malicious people to imitate web
pages of real websites. The attacker of phishing is known as a phisher. Phisher usually do their
evil by create web pages that are very similar to the real web pages in order to scam their victims
by reveal their personal information [1]. Victims will be tricked by clicking the malicious link,
which is can lead to the installation malware or the Web pages that look-alike to the legitimate site
but actually it is not the real thing. It would freeze of the system without victim’s notice and
automatically get their personal information like password, bank account number, social security
number, credit card details. So, the users can be easily deceived by this scam because phishers can
misuse their personal information without the user’s permission. Even worse, phishing attacks may
cost companies hundreds of thousands of dollars per attack in fraud-related losses and personal
time.
In order to secure from phishing scam, Vowpal Wabbit was implemented. It is a latest
machine learning researches into algorithms. This research intends to utilize Vowpal Wabbit
algorithm of breaking the stream of text into words, symbols, phrases or another meaningful
element. It also a fast machine learning and able learning the terascale datasets faster than any
other models. The classification process will be based on a different characteristics such as spelling
errors, poor grammar, long URLs, generic salutation and personalization.
1.2 Problem Statement
• The attackers can steal a sensitive information and use it for dangerous purposes. It can
happen when the user click the malicious link and it immediately install the malware inside
the user’s device.
• Attackers usually use official logos from real organizations and other identifying
information by taken directly from legitimate Web sites including a deceptive URL address
linking to a scam web site.
• With regard this matter, this research intends to leverage Vowpal Wabbit algorithm to
secure email from phishing scams.
1.3 Objective
• To study about Vowpal Wabbit algorithm in order to secure websites in phishing scam.
• To modify Vowpal Wabbit algorithm to suit with Weka based system settings.
• To test the data sets by using Vowpal Wabbit algorithm in Weka in order to detect phishing
websites.
1.4 Scope
• Classify of phishing scam messages and pre-process the content of the messages.
• Subject will test using the computer program and will generate the data from the test.
• Classify the data using Weka tool to get the accurate results.
•
1.5 Limitation of Work
• Website system for detecting the phishing scamming messages only.
• The system will analyze the text of the message and malicios link.
• Focuses in single language
~ Language on text can be analyze by only English language.
Vowpal Wabbit
algorithm
Weka
Data Sets Results
1.6 Thesis Organization
This report covers all the necessary information about the project. In chapter 1, this report
covers about the introduction of the project where the details about objectives of the project, the
scope and also the limitation of work. In chapter 2, the report mainly covers about the previous
researches that were used as references for this project and it relations to this project.
The next chapter is methodology details. This chapter tells about the framework of the
project and all details about software and hardware that this project used to produce results.
CHAPTER 2
LITERATURE REVIEW
2.1 Introduction
In this chapter, it will discuss and portrays about literature review for machine learning
classifier that being used for previous research. A literature review is about past research or recent
research or what need to search or seek the truth for the purpose portraying or illustrate the research
problem, solutions and the importance of seeking a solution. A literature review is not about
information gathering. The literature review shows in-depth grasp and summarize prior research
that linked to the research subject in a chosen topic. Literature review involves the process of
reading journal, books, articles and research paper. After that, it will be analyzing, summarize and
evaluate the reading based on connection to the project. It is a guideline to stablishes the credibility
for the better project.
2.2 Phishing
2.2.1 Definition of Phishing
Phishing is a criminal technique employing both social engineering and technical
subterfuge to steal consumer’s personal identity data and financial account credential [1]. Phishing
is a new type of attack dating from the mid-1990s, and it soon become a major problem in an
online transaction. The word “phishing” appeared when Internet scammers were using email lures
to “fish” for passwords and financial information from the sea of Internet users; “ph” is a common
hacker replacement of “f”, which comes from the primary form of hacking, “phreaking” on
telephone switches [3]. The attacker of phishing is known as a phisher. A phisher attempts to
deceive the online customer by sending an email and click a site that falsely claiming to be an
established legitimate enterprise in an attempt to scam the user into surrendering private
information that will be used for identity theft. Legitimate organizations would never request this
information via email.
2.2.2 How Phishing works ?
Phishing process usually starts with a spoofed email that it will invite the user to login to
their accounts by using in a forged Webpages that also very closely resembles the official website
such as bank or an e-shop. The spoofed emails often look like the valid emails because the phishers
share the same logos and graphic pictures as the original website. In addition, the scam emails
contain deceptive URL addresses linking to a scam website [4]. The information that phisher will
get as soon as the victim enters the username, password or the credit card number. Moreover, users
should not forward unauthenticated emails or click on unusual links in email or use the search
engines to look for online donations and charitable organizations [5].
2.2.3 Types of Phishing
[3] The categories of phishing are as follows :
• Clone Phishing
Clone phishing creates a cloned email. User does this by getting information such as
content and recipient addresses from an authorized email which was delivered previously,
then user sends the same email with links replaced by malicious ones. User also employs
address spoofing so that the email looks to be from the primary sender. The email can claim
to be a re-send of the original or an updated version as a trapping strategy.
• Spear Phishing
Spear phishing targets at a specific group. Instead of casting out thousands of emails
randomly, spear phishers target selected groups of people with something in common, for
example group from the same organizations. Spear phishing is also being used against
high-level marks, in a type of attack called “whaling”.
• Phone Phishing
Phone phishing refers to messages that demand to be from a bank asking users to dial a
phone number paying attention to the problems with their bank accounts. Traditional phone
equipment has dedicated lines, voice over IP, being easy to manipulate, becomes a good
choice for the phisher. Once the phone number, closely-held by the phisher and provided
by a VoIP service, the voice prompts tell the caller to enter her account numbers and PIN.
Caller ID spoofing, which is not impermissible by law and it can be used along with this
so the call appears to be from a trusted caller.
• Domain Spoofing
Domain spoofing attack uses either email or fraudulent websites. It occurs when a
cybercriminal “spoofs” an organization or company’s domain to make their emails look
like they’re coming from the official domain or make a fake website look like the real site’s
design and using a similar URL.
• Watering Hole Phishing
It is a reminiscent of a scene from the animal kingdom. There target a businesses by
identifying specific websites that your company or employees visit most often and
infecting one of the sites with a malware.
• Evil Twin
This is a form of phishing that usually happens on Wi-Fi. It also been referred ta the
Starbucks scam because it often takes place in coffee shops. They used the set of service
identifier (SSID) that look alike as the same network.
2.3 Scam
2.3.1 Definition of Scam
According to a group called Computer Hope, a scam is a term used to describe any
fraudulent business or scheme that takes money or other goods from an unsuspecting person. With
the world becoming more connected through the Internet, online scams had increased and it is
often up to help stay cautions with people on the Internet.
2.3.2 Types of Scam
Below are the categories of scam actions :
• Online Survey Scams
➢ It is a site that says they offer a large amount of money or gift vouchers to participants
for answering questions. The main goal of an online survey scams is to get a
demographic information and the site can sell this information to scammers, spammers
or other marketers.
• 419 Scam
➢ This scam is called 419 or Nigeria scam. The name is after the penal code that it is
prosecuted under in Nigeria, Africa. Victims can gain a large amount of money and
this scam only requires bank information to deposit the money into victim’s account.
This bank information is used to against the person or the deposits are kept with no
reward.
• Catfish
➢ A person who creates a fake online profile with the intention of deceiving someone
else. For example, a woman could create a fake profile on an online dating website.
She can start a relationship with one or more people and make up a story in attempt to
get a money from them.
• Auction Fraud
➢ Unknown seller that selling something on an online auction site such as eBay that
appears to be something it is really impossible. For example, seller is selling a tickets
for an upcoming football match that really are not an official tickets on the Internet.
• Donation Scam
➢ A person may claim that they have or have a child or someone that they know with an
illness and urgently need financial assistance. Although this donation can be real, there
are also an alarming number of people who create a fake accounts on donation sites in
the hope to scam people out of money.
• Cold Call Scam
➢ A person that claims to be from a technical support from a computer company like HP,
saying that they had received the information that your computer is infected with a
virus, or it already been hacked. Then, they offer to remotely connect your computer
and to fix the problem. It is a tactic that used by scammers to con you out of money.
• Chain Email
➢ An unsolicited email containing false information of the purpose to scare, intimidating
or deceiving the recipient. The purpose is to coerce the recipient to forward the email
to other unwilling recipients which is a malicious or spurious message. It will keep
spreading until someone notice it is actually a chain email.
• Phishing
➢ Phishing is a criminal technique employing both social engineering and technical
subterfuge to steal consumer’s personal identity data and financial account credential.
For example, receiving email from someone that pretend to be your bank indicating
that you are overdrawn or the purchase you did not make and asking you to log in and
verify the information.
2.4 Vowpal Wabbit Algorithm
Vowpal Wabbit is a learning system sponsored by Microsoft Research and Yahoo!
Research (previously). The goal is to develop a single machine learning algorithm that is inherently
fast and able of being run in both standalone machines and in parallel processing environments.
Also it capable of handling the datasets in the scale of terabytes (Big data open source platform).
It has three references :
• The vorpal blade of Jabberwocky
• The rabbit of Monty Python
• Elmer Fudd who hunted the wascally wabbit
It is a hybrid for both the stochastic gradient descent and the batch learning algorithm. Briefly, the
input of stochastic gradient descent is read in a sequential order and used to update predictors for
future data at each step while batch learning techniques calculate predictors by learning on the
entire data set at once [6].
Wabbit runs mainly as a library or a standalone daemon service but it is fully ready to be
deployed in cloud environments in terms of deployment [7]. Four main features that used in
combination to get a better result is :
• Input formats of data
➢ Example : consist of free form text, which is interpret in a bag-of-words way and it can
be multiple sets of free form text in different namespaces.
• Speed of learning
➢ It can be affectively applied on learning problems with a sparse terafeature.
• Scalability of the data sets analyzed
➢ The characteristics is the memory footprint of the program which is bounded
independent of data.
➢ It is means the training set is not loaded into main memory before learning starts.
• Feature pairing.
➢ It is subsets of features can be internally paired so that algorithm is linear in the cross-
product of the subsets.
2.5 Email Filtering Techniques
[8] Type of program that filters and separates email into a different folder based on a
specified criterion. Todays, people have to waste a significant of time to deal with spams and scams
in email filtering. It also gives increase a problem like personal information leaking, malware
infection and one click fraud. Nowadays, the design goals that can be given for the spam and scam
filtering techniques as below:
• Accuracy of Decision
➢ The system of technique should give accurate result within the time in order to mistake
minimization of non-spam URLs.
• Classification should be context independent
➢ Classification should allow services for different webservices.
• Results in Real-Time
➢ Some services like social networking and many others are working in real time. So, it
is needed the spam and scam filtering that can be done with a small delay.
• Fine-grained Classification
➢ The system should be easily recognizing a different between spams which is hosted on
public- services with ‘non-spam content.
2.6 Comparison between Methods
No Title Author / Year Method Description
1. Detection of
Phishing Emails
using Data Mining
Algorithms
Smadi, S., Aslam, N.,
Zhang, L., Alasem,
R., & Hossain, M. A.
(2015, December)
Data Mining
Algorithm
J48 Classification
Algorithm
~ Enhance the overall metrics
values of email classification by
focusing on the preprocessing
phase and determine the best
algorithm.
~ Extracted a set of features are
classified using the J48
classification algorithm.
~ Results achieved 98.87%
accuracy for random forest
algorithm which is the highest
registered.
2. Detection of
Phishing Emails
using Feed Forward
Neural Network
Jameel, N. G. M., &
George, L. E.
(2013)
Feed Forward
Neural Network
~ Phishing detection model is based
on the extracted email features to
detect phishing emails. These
features appeared in the header and
HTML body of email using feed
forward neural network.
~ Using two phases which is
training and testing.
~ Consists of three stages, namely,
pre-processing, neural network
training and application oh phish
detection.
~ The results of the conducted tests
indicated good identification rate
(98.72%) with short required
processing time (0.00067 msec).
3. Classifying Phishing
Emails using
Confidence-
Basnet, R. B., &
Sung, A. H.
(2010)
Confidence-
Weighted Linear
Classifiers (CWLC)
~ Use the contents of the emails as
features without applying any
heuristic based phishing specific
Weighted Linear
Classifiers
features and obtain highly accurate
results.
~ CWLC is a new class of online
learning method designed for
Natural Language Processing
(NLP) problems based on the
notion of parameter confidence.
~ Results achieved the best F-
measure of 99.83%.
4. Classification of
Phishing Email using
Random Forest
Machine Learning
Technique
Akinyelu, A. A., &
Adewumi, A. O.
(2014)
Random Forest
Machine Learning
Technique
~ To improved phishing email
classifier with better prediction
accuracy and fewer number of
features.
~ This method is an ensemble
learning classification and
regression method.
~ Results classification accuracy of
99.7% and low false negative (FN)
and false positive (FP) rates.
5. Detecting Phishing
Emails using Hybrid
Features
Ma, L., Ofoghi, B.,
Watters, P., & Brown,
S.
(2009, July)
Hybrid Features
Robust Classifiers
~ Build a robust classifier to detect
phishing emails using hybrid
features and select features using
information gain.
~ Also analyses the quality of each
feature using information gain and
the best feature set is selected after
a recursive learning process.
~ Three types of features defined
manually based on observation in
emails which is content features,
orthographic features and derived
features.
~ Extract feature vectors from the
emails which effectively represents
the instances to detect phishing
emails.
~ Results achieve decision tree
produced the highest accuracy
which builds a better classifier.
6. Detecting Phishing
Emails using Text
and Data Mining
Pandey, M., & Ravi,
V.
(2012, December)
Text and Data
Mining
~ Analyzed phishing emails after
extracting 23 keywords from the
email bodies using text mining.
~ Results obtained 98.12, 97.29 as
accuracy and sensitivity
respectively using 23 features the
GP yields the best result.
7. Collaborative Email-
Spam Filtering with
the Hashing Trick
Attenberg, J.,
Weinberger, K.,
Dasgupta, A., Smola,
A., & Zinkevich, M.
(2009, July)
Hashing - Trick ~ Technique can be used with a
variety of classifiers and can
implemented in a few lines of code
for collaborative spam filtering.
~ This method to scale up linear
learning algorithms.
~ Also used the Vowpal Wabbit
(VW) implementation of stochastic
gradient descent on a square-loss.
~ Result is more robust against
noise and absorbs individual
preferences in the context of spam
classification.
8. Anti-Phishing
Detection of
Phishing Attacks
using Genetic
Algorithm
Shreeram, V., Suban,
M., Shanthi, P., &
Manjula, K.
(2010, October)
Genetic Algorithm ~ To detect phishing by using the
rule-based system.
~ These algorithms is used to
evolve rules that used to
differentiate the legitimate link and
phishing link.
~ It can get a minimal false
negatives at a speed adequate for
online application.
9. Learn To Detect
Phishing Scams
using Learning and
Ensemble Methods
Saberi, A., Vahidi, M.,
& Bidgoli, B. M.
(2007, November)
Learning and
Ensemble Methods
~ Used three different learning
methods to detect phishing scams.
~ Applied the ensemble method on
the outputs of different classifier to
increase the accuracy of other filter
results.
~ It detect 94.4% of scam emails
while it only detect 0.08% of
legitimate emails.
10. Detecting Phishing
Websites using
Associative
Classification
Ajlouni, M. I. A., Hadi,
W. E., & Alwedyan, J.
(2013)
Associative
Classification
~ Get the potential use of
automated data mining techniques
and detect problem of phishing
Websites.
~ Used two different associative
classification which is MCAR and
CBA.
~ MCAR achieved an average on
6.8%,6.1% and 5.4% which is the
highest accuracy while CBA
algorithm outperformed of SVM
and NB algorithms.
2.7 Summary
Based on the literature review, there are various type of method that can be apply to detect
phishing. Literature review can give the details and some research of the related studies. Some of
the type of method is J48 Classification Algorithm, Feed Forward Neural Network, Confidence-
Weighted Linear Classifiers, Random Forest Machine Learning Technique, Hybrid Features, Text
and Data Mining, Hashing – Trick, Genetic Algorithm, Learning and Ensemble Methods and
Associative Classification. But, for this project we propose the method of Vowpal Wabbit
Algorithm.
CHAPTER 3
METHODOLOGY
3.1 Introduction
Methodology is a systematic way that solves the research problem to achieved the
objectives. This chapter will explain the specific details on the methodology being used in order
to develop this project. In order to make sure the project is in the right path, methodology plays an
important role as a guide for the project to complete and working well as plan. There is different
type of methodology that is used for different type of application. It is very important to choose
the right and suitable methodology for the development of the application thus it is necessary to
understand the application functionality itself. Selection of methodology to be used should be
compatible with the application which is being developed. It can be apply through technique,
algorithm or method. It comprises by theoretical analysis of methods and principles associated
with a branch of knowledge. It also defines as rules, principles or procedure that use for developing
a system or project.
3.2 Specification and System Requirements
System requirement is needed to accomplish this project and assist the development of the
project. It can involve a system requirement in hardware and software. Each of these requirements
is related to each other to make sure that the system can be done smoothly.
3.2.1 Determine Requirements
In this stage, we collect the information about the project from the previous research. Then,
we analyze the previous research to get the data that they collect in the form of the security,
problem statement and the method that are used. This project we analyzed research about the
phishing scam filtering and their technique in what algorithm have been used to apply in this
project. In order to overcome the problem that stated in 1.2, this methodology builds a referring to
the three main objectives that stated in 1.3. The first objective to study about Vowpal Wabbit
algorithm in order to classify phishing scam website, second to modify the Vowpal Wabbit
algorithm to suit with Weka based system settings and lastly to test the data sets by using Vowpal
Wabbit algorithm in Weka.
3.2.2 Hardware
My suggestion is to get a high performance of processor and get a higher capacity of RAM
with a better high-end device. It is because machine learning required to use a high speed processor
to train the model if it is related to large amount of data.
3.2.3 Software
No. Software Description
1. Google Chrome To search for a related articles and method for the
project.
2. Microsoft Word 2016 Microsoft Word used for word processing such as
creating and editing report and documentation.
3. Microsoft PowerPoint 2016 To present the result and for project presentation.
4. Snipping Tools Used to captured and screen shot the images.
5. WEKA Application used for classification and project main
development phase.
6. WinZip To extract the data.
7. PyCharm Used for modify coding.
3.3 Algorithm
This chapter will discuss about the algorithm that will be used to carry out of the project.
It also explains thoroughly about the algorithm and the reason why it was chosen. In order to ensure
the project will be running smoothly and according to the plan, methodology takes place as a
guideline for the project. It is very important to choose the suitable algorithm and the best one so
that our analysis did not affected by other factors. Moreover, it is important to ensure that the
algorithm is able to run in the device so that the study did not disturbed mid-way.
3.3.1 Vowpal Wabbit
Vowpal Wabbit is a machine learning system that incorporate into algorithms. It can handle
a large dataset in scale of Terabytes. It also a single machine and it develop a good predictor faster
than most other models. Vowpal Wabbit is used for a decision service for a personalized news
recommendation system. Moreover, it is an open interactive machine learning solution for
reinforcement learning, supervised learning and other machine learning paradigms. Vowpal
Wabbit supports solutions to a range of real-world problems through reductions to standard
learning algorithms. This versatility empowers us to frame learning problems effectively and
achieve the best solution.
Figure : The example of Reduction Stack
3.4 General Framework
Users are exposed from the phishing when visited the unknown websited. Scammer is a
threat that sending a scam ad in order to encourage the users to give out their private information
such as username, password and banking details.
Figure 3.1 : A framework of how data being process.
Install PyCharm and
Weka in Windows.
Modify coding of Vowpal
Wabbit in PyCharm. Spilt the data into train
and test datasets in Weka.
OR
SCAM NON - SCAM
Get the accuracy to identify
scam or non-scam.
3.5 Summary
Methodology is very important in system and application development. There also a lots
of different software development methodology that available and can be used to develop any kind
of application. The right methodology can help the project to be done according to the specified
time. The activities in each phase in the methodology are explained so that it can be understood
easily.
References
[1] Ajlouni, M. I. A., Hadi, W. E., & Alwedyan, J. (2013). Detecting phishing websites using
associative classification. image, 5(23), 36-40.
[2] Akinyelu, A. A., & Adewumi, A. O. (2014). Classification of phishing email using random
forest machine learning technique. Journal of Applied Mathematics, 2014.
[3] Nivedha, S., Gokulan, S., Karthik, C., Gopinath, R., & Gowshik, R. (2017). Improving
Phishing URL Detection Using Fuzzy Association Mining. The International Journal of
Engineering and Science (IJES), 6.
[4] Salem, O., Hossain, A., & Kamala, M. (2010, June). Awareness program and ai based tool to
reduce risk of phishing attacks. In 2010 10th IEEE International Conference on Computer and
Information Technology (pp. 1418-1423). IEEE.
[5] Chen, T. S., Jeng, F. G., & Liu, Y. C. (2006, December). Hacking tricks toward security on
network environments. In 2006 Seventh International Conference on Parallel and Distributed
Computing, Applications and Technologies (PDCAT'06) (pp. 442-447). IEEE.
[6] Agarwal, A., Chapelle, O., Dudík, M., & Langford, J. (2014). A reliable effective terascale
linear learning system. The Journal of Machine Learning Research, 15(1), 1111-1133.
[7] de Almeida, P. D. C., & Bernardino, J. (2015, June). Big data open source platforms. In 2015
IEEE International Congress on Big Data (pp. 268-275). IEEE.
[8] Revar, P., Shah, A., Patel, J., & Khanpara, P. (2017). A Review on Different types of Spam
Filtering Techniques. International Journal of Advanced Research in Computer Science, 8(5).
[9] Smadi, S., Aslam, N., Zhang, L., Alasem, R., & Hossain, M. A. (2015, December). Detection
of phishing emails using data mining algorithms. In 2015 9th International Conference on
Software, Knowledge, Information Management and Applications (SKIMA) (pp. 1-8). IEEE.
[10] Jameel, N. G. M., & George, L. E. (2013). Detection of phishing emails using feed forward
neural network. International Journal of Computer Applications, 77(7).
[11] Basnet, R. B., & Sung, A. H. (2010). Classifying phishing emails using confidence-weighted
linear classifiers. In International Conference on Information Security and Artificial Intelligence
(ISAI) (pp. 108-112).
[12] Ma, L., Ofoghi, B., Watters, P., & Brown, S. (2009, July). Detecting phishing emails using
hybrid features. In 2009 Symposia and Workshops on Ubiquitous, Autonomic and Trusted
Computing (pp. 493-497). IEEE.
[13] Pandey, M., & Ravi, V. (2012, December). Detecting phishing e-mails using text and data
mining. In 2012 IEEE International Conference on Computational Intelligence and Computing
Research (pp. 1-6). IEEE.
[13] Attenberg, J., Weinberger, K., Dasgupta, A., Smola, A., & Zinkevich, M. (2009, July).
Collaborative email-spam filtering with the hashing trick. In Proceedings of the Sixth Conference
on Email and Anti-Spam.
[14] Shreeram, V., Suban, M., Shanthi, P., & Manjula, K. (2010, October). Anti-phishing detection
of phishing attacks using genetic algorithm. In 2010 International Conference on Communication
Control and Computing Technologies (pp. 447-450). IEEE.
[15] Saberi, A., Vahidi, M., & Bidgoli, B. M. (2007, November). Learn to detect phishing scams
using learning and ensemble? methods. In Proceedings of the 2007 IEEE/WIC/ACM International
Conferences on Web Intelligence and Intelligent Agent Technology-Workshops (pp. 311-314).
IEEE Computer Society.