Classification Algorithms in Intrusion Detection System… · The host based intrusion detection system detects only the malicious packet which enters our system. It detects only

Classification Algorithms in Intrusion Detection System: A Survey

V. Jaiganesh1

Dr. P. Sumathi2

A.Vinitha3

1

Doctoral Research Scholar, Department of Computer Science, Manonmaniam Sundaranar University, Tirunelveli

Tamil Nadu, India.

[email protected]

2

Doctoral Research Supervisor, Assistant Professor, PG & Research Department of Computer Science,

Government Arts College and Science College, Coimbatore, Tamil Nadu, India. [email protected]

3

M.Phil Scholar, Department of Computer Science, Dr. N.G.P Arts and Science College, Assistant Professor,

Sasurie Arts &Science College, Erode, Tamilnadu, India.

[email protected]

Abstract

Intrusion Detection system is a software

which helps us to protect our system from other

system when other person tries to access our

system through network. It secures our system

resources without giving access to other system.

Nowadays internet has becoming more popular and

wide. Many of them try to access the resources of

unauthorized person to win their business. In this

paper the data mining algorithm which helps to secure our system. In data mining classification

algorithms helps easily to secure the system.

Classification predicts the future data what the

output comes. Intrusion detection system can be

used for both host and network. The two algorithms

surveyed are ID3 and C4.5. There are two types of

detection methods. One is misuse detection and

another one is anomaly detection.

Keywords: Intrusion Detection System

Architecture, Detection types, Attacks, Protocols,

KDD cup data set, ID3 algorithm, C4.5 algorithm,

Decision trees, Classification.

1. Introduction

Intrusion detection system and prevention

system are same. Both are used to detect the

malicious program which enters in our network or

host. The only difference is the prevention system

will give the response to malicious program by

using firewall, anti spam and by blocking the

malicious activity. We can perform the intrusion

detection in network and host. There are two types

of intrusion detection system. They are signature

based and anomaly based detection methods. We

can provide the intrusion prevention system with

the proper soft ware’s and hardware. Then only we

can secure our system. Predictive modeling is used

to predict the output based on historical data. Classification is used to predict the output by

historical data. It has two processes. One is we

should build the model and another one to see the

resulting model. It is mainly used in customer

segmentation, business modeling, credit risk and

biomedical research and drug responses modeling.

2. Intrusion Detection Systems

Architecture An intrusion detection system is a

software program which helps to identify the

malicious program which enter our system or in

network. It helps to secure our system by

responding to the malicious program. It is divided into two types. They are host based intrusion

detection system and network based intrusion

detection system. The active system will respond to

the malicious program. But the passive system will

detect only whether any malicious packets entered

the system or not.

IDS Architecture

Firewall

Figure 2.1

Host Based Intrusion Detection System

Internet

Company

Network

Company

Network

I

D

S

I

D

S

Router

A Vinitha et al, Int.J.Computer Technology & Applications,Vol 4 (5),746-750

IJCTA | Sept-Oct 2013 Available [email protected]

746

ISSN:2229-6093

[email protected]

mailto:[email protected]

The host based intrusion detection system detects only the malicious packet which enters our

system. It detects only our host system. It does not

detect the whole network.

Network Based Intrusion Detection System

The network based intrusion detection

system detects the whole network and alerts the

network administrator about the malicious activity.

It secures whole network.

3. Detection Types

There are two types of detection. They are

anomaly detection and signature detection

Anomaly detection

It checks the normal system activity like

the network bandwidth, ports, protocols and device

connection. If there is any abnormal activity in

system or network it informs the administrator

Signature detection

It monitors all network packets with

previously known attacks that are called signatures.

It is stored in database.

4. Attacks in IDS

There are four different types of attacks.

Denial of service attack (Dos):

It is an attack in which the attacker makes the

memory too busy or too full to handle the requests.

User to Root Attack (U2R):

It is an attack in which attacker tries to access the

normal user account.

Remote to Local Attack (R2L):

It is an attack in which attacker sends packets to a

machine over a network but does not have an

account on that machine.

Probing Attack:

It is an attempt to gather information about the

network of computers.

5. Protocol Attacks in IDS

ICMP (Internet control message protocol)

It is used by internet protocol layer to send one way

message to host. There is no authentication in

ICMP which leads to denial of service attack.

TCP (Transmission control protocol)

If one application wants to connect with another

application TCP protocol is used. It set ups a

communication line between two systems. The

attacker tries to access this connection.

UDP (User Datagram Protocol)

Using UDP the user can send message to another

host without transmission channels. It may arrive

out of order. The attacker may send some messages

by using this protocol.

Detection Rate

The detection rate is number of intrusions detected

by the system divided by total number of intrusions

present in the sample data.

False Alarm Rate

It is defined as the number of normal patterns

detected as attacks.

6. Data Mining

Data mining is used to search information

from the large set of databases. It is divided into

two types. The first one is predictive and the

second one is descriptive. Predictive is used to

predict the output using historical data. It

predetermines the output. The descriptive method

gives information about what the data contains, and

tells about its relationships. We have chosen the

predictive technique for intrusion detection system.

Classification

Classification is used to determine the

predetermined output. It predicts the target class for each data item. It assigns the data into target

classes. For example it is used to identify the credit

risk as low, high, medium.

Classification Task

Induction

Deduction

Figure 4.1

Training

set

Learning

Algorithm

Learn

model

Apply

model

Model

Test set



747

ISSN:2229-6093

Examples of Classification Task

1. Predicting tumor cells as benign or

malignant.

2. Classifying credit card transactions as

legitimate or fraudulent.

3. Classifying secondary structures of protein

as alpha helix, beta sheet, or random coil.

4. Categorizing news stories as finance,

weather, entertainment and sports etc.

Classification techniques:

1. Decision tree based methods

2. Rule based methods

3. Memory based reasoning 4. Neural networks

5. Naïve Bayes and Bayesian Belief networks

6. Support vector machines.

Decision Tree

It is used in statistics, machine learning,

and data mining. It is a predictive model which is

used to observe the data item and concludes the

target output value. Here leaves represent class

labels and branches represent conjunctions. It does

not describe data or decisions it simply makes the

classifications. It generates rules and it is very easy

for the humans to understand. It helps to search a

record in a database. These rules provide a model transparency. There are two properties of rules.

They are support and confidence. It helps us to

rank the rules and predict the output.

Example for decision tree

Abdomen Throat Chest None

Heart attack

Appendicitis

Yes No

Yes No

Flu Strep

Yes No

Flu Cold

The complexity of the tree is measured

using its one of the metrics. They are total number of leaves; total number of nodes, number of

attributes used, depth of the tree. There are two

different groups. They are top down approach and bottom up approach. The algorithms ID3 and C4.5

are top down approaches. The C4.5 contains two

phases. They are growing phase and pruning phase.

The ID3 contain only one phase that is growing

phase. Both algorithms are greedy for optimum

solutions.

7. ID3 Algorithms

The ID3 stands for Iterative

Dichotomiser2. It is the precursor for C4.5

algorithm. The algorithm was invented by Ross

Quinlan.

1. Create a root node

If all the elements in C are positive then

create yes node and stop.

If all the elements in C are negative then

create no node and stop.

Or

Select the feature F with values from v1 to

vn.

2. Divide the training elements in c into

subsets c1, c2, and c3…cn with v values.

3. Apply the algorithm recursively for all the

ci elements.

For selecting feature node the user has to

use selection heuristic. It uses the greedy

search to select the best possible attribute. If

the attribute selects best then it will stops

otherwise it repeats till the condition satisfies.

Data Description

1. Attribute value description.

2. Predefined classes

3. Discrete classes

The ID3 can decide the best attribute by using the

statistical property information gain. The gain

measures how the attributes separates the training

examples into target classes. The one with the

highest information is selected. In order to define

gain we can use entropy from information gain.

The entropy measures the amount of information

gain.

Given a collection S of c outcomes

Entropy(S) = S -p (I) log2 p (I)

Where p (I) is the proportion of S belonging to class I. S is over c. Log2 is log base 2. S is not an

attribute but the entire sample set.

Advantages of ID3 Algorithm

1. Easy prediction rules can be generated

from the training data.

2. It builds the fastest tree

3. It builds the short tree

Pain

Fever

Cough

Fever None



748

ISSN:2229-6093

8. C4.5 Algorithms

It was developed by Quinlan. C4.5 builds

decision trees from a set of training data using

information theory concept. The training data is an

S= S1, S2… are already classified samples. Each Si

has a p-dimensional vector where Xj represents

attributes of samples. At each node of the tree C4.5

chooses an attribute that mostly splits the samples

into subsets. The splitting criteria use information

gain. The attribute with the highest information

gain is chosen to make decision.

For building decision tree,

1. Check for base classes

2. For each attribute a find the information

gain from splitting a

3. Let a is a best attribute with the highest

information gain.

4. Create a decision node that splits the a

5. Recurse on the sub lists obtained by

splitting a best and add those nodes as

children’s of nodes.

It can handle both continuous and discrete

data. It can handle the missing attributes values. After finishing it goes back for

pruning. The new version is C5.0.

9. KDD Cup Dataset

It is a sample dataset which is used for intrusion

detection methods. It consists of 4 gigabytes of

compressed raw data of 7 weeks of network traffic.

It contains 2 million connection records. Using this

data set the data can be classified either as normal

or attack

10. Weka Data Mining Tool

Weka (Waikato environment for knowledge

analysis) is a machine learning software. It is free

software available under general public license. It

is a collection of algorithms for data analysis and

predictive modeling. It is easy to use. It can run on

any platform. It is fully implemented in java

programming language.

11. Conclusion

Security is the main thing for protecting

our files. Many hackers try to access the

unauthorized files. For protecting the data, decision trees algorithm is the one of the easy technique to

secure our system. In this paper ID3 algorithm and

C4.5 algorithms are compared to find the best

results. In this best one suited for intrusion

detection is C4.5 algorithm, because it uses

numeric and nominal data. The C4.5 algorithm is

also very easy to understand.

12. References

[1] “Anomaly-based network intrusion detection Techniques, systems and challenges “

P.Garcıa- Teodoroa, J. Dıaz-Verdejoa,

G.Macia-Fernandez, E. Vazquezb

[2] “A Survey and Comparative Analysis of Data

Mining Techniques for Network Intrusion

Detection Systems” Reema Patel, Amit

Thakkar, Amit Ganatra.

[3] “Intrusion Detection: A Survey” Aleksandar

Lazarevic, Vipin Kumar, Jaideep Srivastava

Computer Science Department, University of

Minnesota.

[4] “Dimension Reduction Techniques Analysis

on SVM Based Intrusion Systems” machine

learning course fall 2012/2013 Aviv Eisenschtat.

[5] “Modern Intrusion Detection, Data Mining,

and Degrees of Attack Guilt” Steven Noel,

Duminda Wijesekera, Charles Youman.

[6] “Comparative Study of Data Mining

Techniques to Enhance Intrusion Detection”

Mitchell D’silva, Deepali Vora.

[7] “A Comparative Analysis of Current Intrusion

Detection Technologies” James Cannady, Jay

Harrell.

[8] “Intrusion Detection Techniques” Peng Ning,

North Carolina State University Sushil Jajodia,

George Mason University.

[9] “A Survey of Intrusion Detection Systems”

Douglas J. Brown, Bill Suckow, and Tianqiu

Wang.

[10] 10. “A Survey of Modern Advances in

Network Intrusion Detection” V. Kotov, V.

Vasilyev Department of Computer

Engineering.

[11] “An Introduction to Intrusion-Detection

Systems” Herve Debar.

[12] “Design Network Intrusion Detection System

using hybrid Fuzzy-Neural Network "Muna

Mhammad T.Jawhar, Monica Mehrotra.

[13] “Efficient Packet Classification for Network

Intrusion Detection using FPGA “ Haoyu Song

, John W. Lockwood



749

ISSN:2229-6093

13. Author Biographies

Mr. V. JAIGANESH is working as an Assistant Professor in the Department of Computer Science,

Dr. N.G.P. Arts and Science College, Coimbatore, Tamilnadu, India. He is doing Ph.D., in Manonmaniam

Sundaranar University, Tirunelveli. Tamilnadu, India. He has done his M.Phil in the area of Data Mining in

Periyar University. He has done his post graduate degrees MCA and MBA in Periyar University, Salem. He has

presented and published a number of papers in reputed conferences and journals. He has about twelve Years of

teaching and research experience and his research interests include Data Mining and Networking.

Dr. P. SUMATHI is working as an Assistant Professor, PG & Research Department of Computer

Science, Government Arts College, Coimbatore, Tamilnadu, India. She received her Ph.D., in the area of Grid

Computing in Bharathiar University. She has done her M.Phil in the area of Software Engineering in Mother

Teresa Women’s University and received MCA degree at Kongu Engineering College, Perundurai. She has

published a number of papers in reputed journals and conferences. She has about Sixteen years of teaching and

research experience. Her research interests include Data Mining, Grid Computing and Software Engineering.

Ms A.VINITHA is working as an Assistant Professor, Department of Computer Science and

Applications, Sasurie College of Arts & Science, Vijayamangalam, Erode, Tamilnadu, India and she is doing

her M.Phil Degree under the guide Mr.V.JAIGANESH of Dr N.G.P Arts & Science College Coimbatore. She

finished her MSc in Dr N.G.P Arts & science college Coimbatore. She is doing her M.Phil in the area Data

mining. She has attended many conferences and she had 2 years of teaching experience. She is interested in

Data mining and networking.



750

ISSN:2229-6093

Documents

Classification Algorithms in Intrusion Detection System… · The host based intrusion detection system detects only the malicious packet which enters our system. It detects only