Upload
vankhanh
View
238
Download
4
Embed Size (px)
Citation preview
Classification Algorithms in Intrusion Detection System: A Survey
V. Jaiganesh1
Dr. P. Sumathi2
A.Vinitha3
1
Doctoral Research Scholar, Department of Computer Science, Manonmaniam Sundaranar University, Tirunelveli
Tamil Nadu, India.
2
Doctoral Research Supervisor, Assistant Professor, PG & Research Department of Computer Science,
Government Arts College and Science College, Coimbatore, Tamil Nadu, India. [email protected]
3
M.Phil Scholar, Department of Computer Science, Dr. N.G.P Arts and Science College, Assistant Professor,
Sasurie Arts &Science College, Erode, Tamilnadu, India.
Abstract
Intrusion Detection system is a software
which helps us to protect our system from other
system when other person tries to access our
system through network. It secures our system
resources without giving access to other system.
Nowadays internet has becoming more popular and
wide. Many of them try to access the resources of
unauthorized person to win their business. In this
paper the data mining algorithm which helps to secure our system. In data mining classification
algorithms helps easily to secure the system.
Classification predicts the future data what the
output comes. Intrusion detection system can be
used for both host and network. The two algorithms
surveyed are ID3 and C4.5. There are two types of
detection methods. One is misuse detection and
another one is anomaly detection.
Keywords: Intrusion Detection System
Architecture, Detection types, Attacks, Protocols,
KDD cup data set, ID3 algorithm, C4.5 algorithm,
Decision trees, Classification.
1. Introduction
Intrusion detection system and prevention
system are same. Both are used to detect the
malicious program which enters in our network or
host. The only difference is the prevention system
will give the response to malicious program by
using firewall, anti spam and by blocking the
malicious activity. We can perform the intrusion
detection in network and host. There are two types
of intrusion detection system. They are signature
based and anomaly based detection methods. We
can provide the intrusion prevention system with
the proper soft ware’s and hardware. Then only we
can secure our system. Predictive modeling is used
to predict the output based on historical data. Classification is used to predict the output by
historical data. It has two processes. One is we
should build the model and another one to see the
resulting model. It is mainly used in customer
segmentation, business modeling, credit risk and
biomedical research and drug responses modeling.
2. Intrusion Detection Systems
Architecture An intrusion detection system is a
software program which helps to identify the
malicious program which enter our system or in
network. It helps to secure our system by
responding to the malicious program. It is divided into two types. They are host based intrusion
detection system and network based intrusion
detection system. The active system will respond to
the malicious program. But the passive system will
detect only whether any malicious packets entered
the system or not.
IDS Architecture
Firewall
Figure 2.1
Host Based Intrusion Detection System
Internet
Company
Network
Company
Network
I
D
S
I
D
S
Router
A Vinitha et al, Int.J.Computer Technology & Applications,Vol 4 (5),746-750
IJCTA | Sept-Oct 2013 Available [email protected]
746
ISSN:2229-6093
The host based intrusion detection system detects only the malicious packet which enters our
system. It detects only our host system. It does not
detect the whole network.
Network Based Intrusion Detection System
The network based intrusion detection
system detects the whole network and alerts the
network administrator about the malicious activity.
It secures whole network.
3. Detection Types
There are two types of detection. They are
anomaly detection and signature detection
Anomaly detection
It checks the normal system activity like
the network bandwidth, ports, protocols and device
connection. If there is any abnormal activity in
system or network it informs the administrator
Signature detection
It monitors all network packets with
previously known attacks that are called signatures.
It is stored in database.
4. Attacks in IDS
There are four different types of attacks.
Denial of service attack (Dos):
It is an attack in which the attacker makes the
memory too busy or too full to handle the requests.
User to Root Attack (U2R):
It is an attack in which attacker tries to access the
normal user account.
Remote to Local Attack (R2L):
It is an attack in which attacker sends packets to a
machine over a network but does not have an
account on that machine.
Probing Attack:
It is an attempt to gather information about the
network of computers.
5. Protocol Attacks in IDS
ICMP (Internet control message protocol)
It is used by internet protocol layer to send one way
message to host. There is no authentication in
ICMP which leads to denial of service attack.
TCP (Transmission control protocol)
If one application wants to connect with another
application TCP protocol is used. It set ups a
communication line between two systems. The
attacker tries to access this connection.
UDP (User Datagram Protocol)
Using UDP the user can send message to another
host without transmission channels. It may arrive
out of order. The attacker may send some messages
by using this protocol.
Detection Rate
The detection rate is number of intrusions detected
by the system divided by total number of intrusions
present in the sample data.
False Alarm Rate
It is defined as the number of normal patterns
detected as attacks.
6. Data Mining
Data mining is used to search information
from the large set of databases. It is divided into
two types. The first one is predictive and the
second one is descriptive. Predictive is used to
predict the output using historical data. It
predetermines the output. The descriptive method
gives information about what the data contains, and
tells about its relationships. We have chosen the
predictive technique for intrusion detection system.
Classification
Classification is used to determine the
predetermined output. It predicts the target class for each data item. It assigns the data into target
classes. For example it is used to identify the credit
risk as low, high, medium.
Classification Task
Induction
Deduction
Figure 4.1
Training
set
Learning
Algorithm
Learn
model
Apply
model
Model
Test set
A Vinitha et al, Int.J.Computer Technology & Applications,Vol 4 (5),746-750
IJCTA | Sept-Oct 2013 Available [email protected]
747
ISSN:2229-6093
Examples of Classification Task
1. Predicting tumor cells as benign or
malignant.
2. Classifying credit card transactions as
legitimate or fraudulent.
3. Classifying secondary structures of protein
as alpha helix, beta sheet, or random coil.
4. Categorizing news stories as finance,
weather, entertainment and sports etc.
Classification techniques:
1. Decision tree based methods
2. Rule based methods
3. Memory based reasoning 4. Neural networks
5. Naïve Bayes and Bayesian Belief networks
6. Support vector machines.
Decision Tree
It is used in statistics, machine learning,
and data mining. It is a predictive model which is
used to observe the data item and concludes the
target output value. Here leaves represent class
labels and branches represent conjunctions. It does
not describe data or decisions it simply makes the
classifications. It generates rules and it is very easy
for the humans to understand. It helps to search a
record in a database. These rules provide a model transparency. There are two properties of rules.
They are support and confidence. It helps us to
rank the rules and predict the output.
Example for decision tree
Abdomen Throat Chest None
Heart attack
Appendicitis
Yes No
Yes No
Flu Strep
Yes No
Flu Cold
The complexity of the tree is measured
using its one of the metrics. They are total number of leaves; total number of nodes, number of
attributes used, depth of the tree. There are two
different groups. They are top down approach and bottom up approach. The algorithms ID3 and C4.5
are top down approaches. The C4.5 contains two
phases. They are growing phase and pruning phase.
The ID3 contain only one phase that is growing
phase. Both algorithms are greedy for optimum
solutions.
7. ID3 Algorithms
The ID3 stands for Iterative
Dichotomiser2. It is the precursor for C4.5
algorithm. The algorithm was invented by Ross
Quinlan.
1. Create a root node
If all the elements in C are positive then
create yes node and stop.
If all the elements in C are negative then
create no node and stop.
Or
Select the feature F with values from v1 to
vn.
2. Divide the training elements in c into
subsets c1, c2, and c3…cn with v values.
3. Apply the algorithm recursively for all the
ci elements.
For selecting feature node the user has to
use selection heuristic. It uses the greedy
search to select the best possible attribute. If
the attribute selects best then it will stops
otherwise it repeats till the condition satisfies.
Data Description
1. Attribute value description.
2. Predefined classes
3. Discrete classes
The ID3 can decide the best attribute by using the
statistical property information gain. The gain
measures how the attributes separates the training
examples into target classes. The one with the
highest information is selected. In order to define
gain we can use entropy from information gain.
The entropy measures the amount of information
gain.
Given a collection S of c outcomes
Entropy(S) = S -p (I) log2 p (I)
Where p (I) is the proportion of S belonging to class I. S is over c. Log2 is log base 2. S is not an
attribute but the entire sample set.
Advantages of ID3 Algorithm
1. Easy prediction rules can be generated
from the training data.
2. It builds the fastest tree
3. It builds the short tree
Pain
Fever
Cough
Fever None
A Vinitha et al, Int.J.Computer Technology & Applications,Vol 4 (5),746-750
IJCTA | Sept-Oct 2013 Available [email protected]
748
ISSN:2229-6093
8. C4.5 Algorithms
It was developed by Quinlan. C4.5 builds
decision trees from a set of training data using
information theory concept. The training data is an
S= S1, S2… are already classified samples. Each Si
has a p-dimensional vector where Xj represents
attributes of samples. At each node of the tree C4.5
chooses an attribute that mostly splits the samples
into subsets. The splitting criteria use information
gain. The attribute with the highest information
gain is chosen to make decision.
For building decision tree,
1. Check for base classes
2. For each attribute a find the information
gain from splitting a
3. Let a is a best attribute with the highest
information gain.
4. Create a decision node that splits the a
5. Recurse on the sub lists obtained by
splitting a best and add those nodes as
children’s of nodes.
It can handle both continuous and discrete
data. It can handle the missing attributes values. After finishing it goes back for
pruning. The new version is C5.0.
9. KDD Cup Dataset
It is a sample dataset which is used for intrusion
detection methods. It consists of 4 gigabytes of
compressed raw data of 7 weeks of network traffic.
It contains 2 million connection records. Using this
data set the data can be classified either as normal
or attack
10. Weka Data Mining Tool
Weka (Waikato environment for knowledge
analysis) is a machine learning software. It is free
software available under general public license. It
is a collection of algorithms for data analysis and
predictive modeling. It is easy to use. It can run on
any platform. It is fully implemented in java
programming language.
11. Conclusion
Security is the main thing for protecting
our files. Many hackers try to access the
unauthorized files. For protecting the data, decision trees algorithm is the one of the easy technique to
secure our system. In this paper ID3 algorithm and
C4.5 algorithms are compared to find the best
results. In this best one suited for intrusion
detection is C4.5 algorithm, because it uses
numeric and nominal data. The C4.5 algorithm is
also very easy to understand.
12. References
[1] “Anomaly-based network intrusion detection Techniques, systems and challenges “
P.Garcıa- Teodoroa, J. Dıaz-Verdejoa,
G.Macia-Fernandez, E. Vazquezb
[2] “A Survey and Comparative Analysis of Data
Mining Techniques for Network Intrusion
Detection Systems” Reema Patel, Amit
Thakkar, Amit Ganatra.
[3] “Intrusion Detection: A Survey” Aleksandar
Lazarevic, Vipin Kumar, Jaideep Srivastava
Computer Science Department, University of
Minnesota.
[4] “Dimension Reduction Techniques Analysis
on SVM Based Intrusion Systems” machine
learning course fall 2012/2013 Aviv Eisenschtat.
[5] “Modern Intrusion Detection, Data Mining,
and Degrees of Attack Guilt” Steven Noel,
Duminda Wijesekera, Charles Youman.
[6] “Comparative Study of Data Mining
Techniques to Enhance Intrusion Detection”
Mitchell D’silva, Deepali Vora.
[7] “A Comparative Analysis of Current Intrusion
Detection Technologies” James Cannady, Jay
Harrell.
[8] “Intrusion Detection Techniques” Peng Ning,
North Carolina State University Sushil Jajodia,
George Mason University.
[9] “A Survey of Intrusion Detection Systems”
Douglas J. Brown, Bill Suckow, and Tianqiu
Wang.
[10] 10. “A Survey of Modern Advances in
Network Intrusion Detection” V. Kotov, V.
Vasilyev Department of Computer
Engineering.
[11] “An Introduction to Intrusion-Detection
Systems” Herve Debar.
[12] “Design Network Intrusion Detection System
using hybrid Fuzzy-Neural Network "Muna
Mhammad T.Jawhar, Monica Mehrotra.
[13] “Efficient Packet Classification for Network
Intrusion Detection using FPGA “ Haoyu Song
, John W. Lockwood
A Vinitha et al, Int.J.Computer Technology & Applications,Vol 4 (5),746-750
IJCTA | Sept-Oct 2013 Available [email protected]
749
ISSN:2229-6093
13. Author Biographies
Mr. V. JAIGANESH is working as an Assistant Professor in the Department of Computer Science,
Dr. N.G.P. Arts and Science College, Coimbatore, Tamilnadu, India. He is doing Ph.D., in Manonmaniam
Sundaranar University, Tirunelveli. Tamilnadu, India. He has done his M.Phil in the area of Data Mining in
Periyar University. He has done his post graduate degrees MCA and MBA in Periyar University, Salem. He has
presented and published a number of papers in reputed conferences and journals. He has about twelve Years of
teaching and research experience and his research interests include Data Mining and Networking.
Dr. P. SUMATHI is working as an Assistant Professor, PG & Research Department of Computer
Science, Government Arts College, Coimbatore, Tamilnadu, India. She received her Ph.D., in the area of Grid
Computing in Bharathiar University. She has done her M.Phil in the area of Software Engineering in Mother
Teresa Women’s University and received MCA degree at Kongu Engineering College, Perundurai. She has
published a number of papers in reputed journals and conferences. She has about Sixteen years of teaching and
research experience. Her research interests include Data Mining, Grid Computing and Software Engineering.
Ms A.VINITHA is working as an Assistant Professor, Department of Computer Science and
Applications, Sasurie College of Arts & Science, Vijayamangalam, Erode, Tamilnadu, India and she is doing
her M.Phil Degree under the guide Mr.V.JAIGANESH of Dr N.G.P Arts & Science College Coimbatore. She
finished her MSc in Dr N.G.P Arts & science college Coimbatore. She is doing her M.Phil in the area Data
mining. She has attended many conferences and she had 2 years of teaching experience. She is interested in
Data mining and networking.
A Vinitha et al, Int.J.Computer Technology & Applications,Vol 4 (5),746-750
IJCTA | Sept-Oct 2013 Available [email protected]
750
ISSN:2229-6093