Network Intrusion Classification Using Data Mining Techniqueszu.edu.jo/UploadFile/PaperFiles/PaperFile_56_54.pdfNetwork Intrusion Classification Using Data Mining Techniques By Amneh

Network Intrusion Classification

Using Data Mining Techniques

By

Amneh H. Alamleh

Supervisor

Prof. Alaa F. Sheta

This Thesis was Submitted in Partial Fulfilment of the

Requirements for Masters Degree in Computer Science

Faculty of Graduate Studies

Zarqa University - Jordan

August, 2015

جامعة الزرقاء تفويضإقرار

بات، أو أنا آمنة حسين العملة، أفوض جامعة الزرقاء بتزويد نسخة من رسالتي/ أطروحتي للمكت

المؤسسات، أو الهيئات، أو األشخاص عند طلبهم حسب التعليمات النافذة في الجامعة.

التوقيع:

2015 / 9 / 7التاريخ:

Zarqa University

Authorization Statement

I am Amneh Hussein Alamleh, authorize Zarqa University to supply copies of my thesis to

libraries, establishments or individuals on request, according to the University regulations.

Signature:

Date: 7 / 9 / 2015

amneh

Typewritten Text

19/8/2015

amneh

Typewritten Text

amneh

Typewritten Text

amneh

Typewritten Text

amneh

Typewritten Text

amneh

Typewritten Text

amneh

Typewritten Text

amneh

Typewritten Text

Acknowledgment

”In the Name of Allah the Most Gracious the Most Merciful”. First and foremost, I

would like to thank Allah for giving me the strength and the patience to complete

this work. Then I would like to thank my brothers and sisters for their continuous

support. Also many thanks for all of my instructors especially my supervisor Prof.

Alaa Sheta for his guidance, assistance and comments. May Allah bless you all.

amneh

Typewritten Text

Contents

List of Figures........................................................................................................ VIII

List of Tables......................................................................................................... I X

List of Acronyms .................................................................................................... X

Abstract in Arabic .................................................................................................. XII

Abstract in English ............................................................................................... XIV

1 Introduction 1

1.1 Motivation .................................................................................................... 1

1.2 Problem Statement ....................................................................................... 2

1.3 Contributions ................................................................................................ 2

1.4 Methodology ................................................................................................ 3

1.5 Thesis outline ............................................................................................... 3

2 Intrusion Detection System 4

2.1 Introduction .................................................................................................. 4

2.2 Firewalls ....................................................................................................... 5

2.2.1 Firewalls limitations ........................................................................ 6

2.3 IDS Definition .............................................................................................. 6

2.4 IDS Classification ........................................................................................ 7

2.4.1 Detection Methods .......................................................................... 7

2.4.2 IDS Architecture ............................................................................. 9

3 Data Mining Techniques 12

3.1 What is Data Mining .................................................................................. 12

3.2 Data Mining and IDS ................................................................................. 12

3.3 Decision Tree ................................................................................................... 13

3.3.1 How to develop a Decision Tree ....................................................... 14

3.3.2 How to Select Tree Root? .............................................................. 15

3.4 Artificial Neural Network .......................................................................... 16 3.4.1 Perceptron ...................................................................................... 17

3.4.2 Multi-layer Perceptron (MLP) ...................................................... 18

3.5 Support Vector Machines ........................................................................... 19

3.5.1 How SVM Works ................................................................................. 20

3.6 Summary .................................................................................................... 21

4 Related Work 23

4.1 IDS Using Artificial Neural Network ........................................................ 24

4.2 IDS Using Support Vector Machine ........................................................... 25

4.3 IDS Using Decision Tree ................................................................................ 26

4.4 IDS Using Feature Selection ...................................................................... 27

5 Experimental Setup and Results 28

5.1 Experimental Data Set ............................................................................... 28

5.1.1 KDDCUP’99 Data Set .................................................................. 29

5.1.2 NSL-KDD ..................................................................................... 30

5.1.3 Class Distribution .......................................................................... 30

5.2 Classification Models Setup ....................................................................... 32

5.2.1 C4.5 .................................................................................................. 32

5.2.2 ANN (MLP) .................................................................................. 32

5.2.3 SVM .............................................................................................. 33

5.3 10-fold Cross Validation .................................................................................. 34

5.4 Feature Selection ........................................................................................ 35

5.5 Search Space Complexity .......................................................................... 37

5.5.1 Best First Search ............................................................................ 37

5.5.2 Genetic Search ............................................................................... 38

5.6 Model Evaluation ....................................................................................... 40

5.7 Results ........................................................................................................ 41

5.7.1 C4.5 .................................................................................................. 41

5.7.2 MLP ............................................................................................... 42 5.7.3 SVM .............................................................................................. 43

5.8 Results Analysis ......................................................................................... 43

5.9 Summary .................................................................................................... 44

6 Conclusions and Future Work 46

A Features of NSL-KDD 47

Bibliography 48

VIII

List of Figures

2.1 IDSs Classification Dimensions (Pathan, 2014) ......................................... ...7

2.2 Signature based IDS deployment (Gadbois, 2011) ..................................... ...8

2.3 Anomaly based IDS deployment (Gadbois, 2011) ..................................... ...8

2.4 Network based IDS (Gadbois, 2011) .......................................................... .10

3.1 Data Mining methods taxonomy (Maimon and Rokach, 2010) ................. .13

3.2 Simple Tree Structure ................................................................................. .16

3.3 The simple perceptron architecture ............................................................ .18

3.4 Proposed MLP architecture ........................................................................ .18

3.5 Left: the margin for a decision boundary is the distance to the nearest

data point. Right: In SVMs, we find the boundary with maximum

margin. (Figure from Pattern Recognition and Machine Learning by

Chris Bishop.)............................................................................................. .20

3.6 The slack variables ζ ≥ 1 for misclassified points, and 0 < ζ < 1 for

points close to the decision boundary. (Figure from Pattern Recogni-

tion and Machine Learning by Chris Bishop.) ............................................ .22

5.1 C4.5 classification model structure ............................................................. .33

5.2 Weka Setup for the MLP classification model ........................................... .34

5.3 Weka Setup for SVM classification model ................................................. .35

5.4 Main steps of feature selection process (Megha and Amrita, 2013) …….... 36

5.5 Block diagram for proposed methodology ................................................. .40

5.6 Correctly Classified Instances for C4.5, MLP and SVM with the orig-

inal data, selected features of BF, and selected features of GS .................. .45

IX

List of Tables

3.1 Example data set ......................................................................................... 15

5.1 Distribution of attack records per attack category of the NSL-KDD……..31

5.2 Experimental data ……………………………………………….……. ... 31

5.3 MLP parameters and their meaning ........................................................... 33

5.4 SVM parameters and their meaning ........................................................... 34

5.5 BFS Selected Features ............................................................................... 38

5.6 GS Selected Features ................................................................................. 39

5.7 Confusion matrix ........................................................................................ 41

5.8 Confusion matrix for the C4.5 model ........................................................ 42

5.9 Confusion matrix for the MLP model ........................................................ 42

5.10 Confusion matrix for the SVM model ....................................................... 43

5.11 Performance evaluation based C4.5, ANN and SVM models ................... 44

A.1 NSL-KDD Intrusion Detection Data set Features (Kayacik et al., 2005) . 47

X

List of Acronyms

ANN Artificial Neural Networks

BFS Best First Search

CART Classification and Regression Tress

CSE Consistency Subset Evaluator

DARPA Defense Advanced Research Projects Agency

DM Data Mining

DoS Denial of Service

DT Decision Tree

FS Feature Selection

GA Genetic Algorithm

GP Genetic Programming

HIDS Host Based Intrusion Detection System

IDS Intrusion Detection System

IG Information Gain

IDEP Intrusion Detection Evaluation Program

KDD Knowledge Discovery of Data

LR Logistic Regression

MARS Multivariate Regression Splines

NB Naïve Bayes

NIDS Network Based Intrusion Detection System

R2L Remote-to-Local

RBF Radial Basis Function

XI

RF Random Forest

ROC Receiver Operating Characteristic

RST Rough Set Theory

SVM Support Vector Machine

U2R User-to-Root

VP Voted Perception

WEKA Waikato Environment for Knowledge Analysis

XII

اختراق الشبكة الحاسوبية باستخدام طرق التنقيب فى البيانات تصنيف

إعداد

آمنة حسين العملة

بإشراف

ستاذ الدكتور عالء فتحي شتااأل

الملخص

بشكل مستمر مع مرور الوقت، مما يجبر ور ويتطن حجم االختراقات على الشبكة الحاسوبية يتزايد إ

المؤسسات على تجديد وتطوير نظم حماية شبكاتھا لتكون في مأمن من الخسارة المالية والمعلوماتية.

جدا في أنظمة الحماية. Intrusion Detection Systemنظام اكتشاف التسلل ( دويع ) عنصرا ھاما

حاوالت اختراق الوصول غير القانوي لنظام الحاسوب أھمية ھذا النظام في أنه يكتشف موتكمن

والشبكة، والذي ينتج عنه: وصول أشخاص غير مخولين الى البيانات واألنظمة، عدم تكاملية البيانات،

وعدم إتاحة األنظمة والبيانات لالستخدام من قبل األشخاص المخولين. ويعتبر تصنيف أنواع الھجوم

ة اكتشاف التسلل. من الخطوات الرئيسية في عملي

Data Miningفي ھذه األطروحة، تم استكشاف ثالث من طرق التنقيب في البيانات (

Techniques للتعامل مع مشكلة تصنيف انواع الھجوم. وھذه الطرق ھي: شجرة القرار (

)Decision Tree) الشبكات العصبية الصناعية ،(Artificial Neural Network و متجھات ،(

في عدة Support Vector Machineالتمييز ( آالت دعم ). وذلك الن ھذه الطرق حققت نجاحا

تطبيقات من بينھا أمن الشبكات. الھدف الثاني من ھذا البحث ھو ايجاد الطريقة االفضل لتقليل حجم

) لھذه النماذج في المرحلة االولى من بنائھا بتقليل عدد complexity reductionالحسابات (

) لھذه البيانات. وتعد عملية ايجاد مجموعة الخصائص األفضل feature reduction(الخصائص

) من بين مجموعة الخصائص الكبيرة عملية معقدة بسبب وجود عدد كبير classالتي تمثل الصنف (

XIII

من الخيارات الممكنة. كما تعتبر عملية ايجاد أفضل مجموعة من الخصائص التي تمثل الصنف

Best First Search andذلك، تمت محاولة حل ھذه المشكلة باستخدام طريقتين (ضرورية وھامة. ل

Genetic Search وقد تمت اعادة عملية التصنيف باستخدام مجموعة الخصائص المختارة .(

).DT, ANN and SVMباستخدام (

أظھرت النتائج أن إعادة بناء النماذج السابقة باستخدام الخصائص التي تم اختيارھا. و قد عملية ثم تمت

) حققت أعلى درجة من الدقة مقارنة بالطرق االخرى. كما ان األداء العام C4.5خوارزمية (

بشكل قليل بعد اختيار جزء من الخصائص. ) تحسنMLPو ( )DTلخوارزميتي (

) لتنفيذ ھذه النماذج NSL-KDD) و قاعدة بيانات تجريبية (Wekaو قد تم استخدام برمجية (

) تحتوي على امكانيات مختلفة لمعالجة البيانات Wekaل للنتائج. إذ أن ھذه البرمجية (والوصو

والتعامل مع خوارزميات التصنيف على حد سواء.

XIV

Abstract

The volume of targeted network attacks is steadily increasing and evolving, forcing

businesses to revamp their network security systems due to possible data and

financial losses. Intrusion Detection Systems (IDS’s) is an essential component for

any security system. IDS main function is to identify unauthorized access that

attempts to compromise confidentiality, integrity or availability of computer or

computer networks. One of the major steps in encountering the problem of

intrusion detection is classifying the types of attacks. In this research, we explore

the use of three data mining approaches to solve the attack classification problem.

They are: the Decision Tree (DT) based C4.5 algorithm, Artificial Neural Networks

(ANN), and Support Vector Machine (SVM). These techniques show successful

outcomes in variety of applications including network security. Another goal of

this research, is to provide a suitable way to reduce the complexity of the

developed classification models in the first phase by reducing the features domain.

It was found that selecting the best set of features from a larger set is a complex

problem because of the large possible choices available. A combination of features

which best represents class(s) of attacks is urgently needed. Therefore, we explored

the use of both the Best First Search (BFS) the Genetic Search (GS) algorithms to

handle this problem.

The classification process based the reduced feature set was repeated using DT,

ANN and SVM. The performance of the decision tree was superior compared to

the other two approaches. The performance of the DT and ANN slightly improved

with feature selection. To develop our results, we used WEKA software and NSL-

KDD data set. This software is adaptive for various changes need to be

implemented of both data pre-processing and embedded algorithms.

Chapter 1

Introduction

1.1 Motivation

Information technology, networking and connectivity is ever-changing and evolving.

As individuals and organizations, vast amounts of critical and sensitive information

are on the web. At the same time, we need to preserve our privacy, information confi-

dentiality, integrity, and availability. Without appropriate implementation of security

controls, this information is at great risk. Kessel and Allan in their survey titled ”‘Get

Ahead of Cybercrime”’stated ”‘Every organization is at risk of a cyber attack”’ (Kessel

and Allan, 2014). Multiple solutions have been proposed to deal with the issue of

information and systems security, such as encryption, security policies, program con-

trols, and firewalls. These are primary security techniques, but they are not enough to

provide secure systems (Muhammad-Imran et al., 2008). Alone, the mentioned secu-

rity solutions prove to be insufficient, however, with the addition of Intrusion detection

systems (IDSs) a more robust and reliable security system can be implemented. IDS is

a very important component in protecting computers and network systems by detect-

ing any new trial of systems abuse. While there are varying types of intruder attacks;

traditional IDSs require a huge amount of human effort in order to maintain, add, and

improve their performance (Ooi et al., 2013). In this thesis we are continuing to work

towards the goal of implementing optimal data mining techniques in order to reduce

human efforts in managing ever changing intruder attacks.

1

1.2 Problem Statement

• In the past, rulebased analysis relies on sets of predefined rules that are provided

by an administrator or created by the system (Moradi and Zulkernine, 2004).

• Rule based (i.e expert systems) cannot adapt with the evolving nature of attacks

resulting in an inflexible detection system.

• Attacks are continuously increasing and evolving.

• Detection methods should have the same nature to be able to detect new attacks.

• Data mining techniques have the ability to deal with evolving and changing

attacks.

• Many DM techniques were proposed, the main object of the research is to find

the most optimal method for IDS implementation which can detect the attacks

in higher accuracy rate and minimum time.

1.3 Contributions

The following contributions were achieved:

1. Studying and analyzing the nature of the KDD dataset and it’s defects, as a result

two points were proposed:

- An equal class distribution dataset.

- Minmize the number of it’s features.

2. Two methods that can be used to reduce the search space of features has been

proposed. This problem is essential for network engineer such that it reduces

the time to locate attack (i.e. reduce delay), the effort of monitoring the most

significant attributes that could be a source of attack and also the reduction of

damages to the network resources. Best First Search (BFS) and Genetic Search

(GS) were used for this purpose.

2

3. Building three IDS models based Decision Tree (DT) with pruning, Multi Layer

Perceptron (MLP), whichis a type of Artificial Neural Networks (ANN), and

Support Vector Machine (SVM) with Radial Basis Function (RBF). These meth-

ods have advantages over the other methods as concluded from the literature.

1.4 Methodology

To perform this research the following steps will be followed:

• A survey on various methods for handling intrusion detection problem.

• Preprocessing: in this stage; analyzing, understanding and making the necessary

preprocessing for NSL-KDD dataset.

• C4.5 Classification tree, Artificial Neural Networks and Support Vector Ma-

chine classification algorithms will be tested on NSL-KDD data set.

• Feature selection will be executed using Best First and Genetic Search algo-

rithms.

• ANN, C4.5, and SVM will be tested another time after applying feature selec-

tion.

• Results will be analyzed and conclusions will be stated.

1.5 Thesis outline

The rest of this thesis is organized as follows: Chapter two focus on the related works

in the field of intrusion detection using various techniques. Chapter three shows what

is IDS, its role in computer security, its architecture and its taxonomy. In chapter four;

the DT, ANN and SVM techniques is described. The research experiments, models

structures, performance developed models, and results analysis comes in chapter five.

Finally the conclusions and suggested future works shown in chapter six.

3

Chapter 2

Intrusion Detection System

2.1 Introduction

Computer security is defined as the protection of computing systems against threats

to confidentiality, integrity, and availability (Summers, 2010; Pfleeger and Pfleeger,

2006).

• Confidentiality: computer relates assets are revealed only to authorized people

with pre-defined rights.

• Integrity: no one except the authorized parties can apply any type of modifica-

tion to systems including: writing, deleting, or creating.

• Availability: the system is capable of providing the services at any given time

to authorized parties.

There are three main categories of security mechanisms: attack prevention, attack

avoidance, and attack detection (Kruegel et al., 2005).

• Attack prevention i.e ways of preventing certain attacks before they reach the

target. Access control is an important element in this category. Firewall is an

important access control system at the network layer.

• Attack avoidance, in this category an intruder may access the targeted resource

4

but the information is modified in a way that makes it unusable for the attacker.

Cryptography is the most important element in this category.

• Attack detection, assumes that an attacker can obtain access to the desired tar-

gets and successfully can violate a given security policy. If the attack happens,

attack detection has to report that something wrong is going on, and has to re-

act in an appropriate way. Intrusion detection systems are the most important

element of this class.

2.2 Firewalls

Firewalls have been designed to protect a network from outside threats as a first line of

defense, firewalls provide a connection from one network to another (Das and Sarkar,

2014). Typically, firewalls come in hardware, software, or a combination form creat-

ing a check point outside of the network. Basically, providing protection from both

directions of the network, firewalls keep outsiders from breaking in and prevents those

inside the network from revealing valuable data. Furthermore, firewalls can proxy an

internet service and block problematic services.

There are three main types of firewall technologies; packet filtering, application

based firewalls or proxy servers, and stateful packet filtering. Not looking at the con-

tents, packet filtering does IP and port based filtering determining whether a packet

can be accepted or not. A proxy server is used between the service requester and the

service provider hiding the real IP address from whoever one is communicating to.

Proxy servers also does the logging and access control and prevents traffic between

networks. Lastly, stateful packet filtering provides more security checks by being a

cross between functionality of packet filtering and proxy firewalls. It inspects the first

packet, then adds entry to state table (Firewalls, 2015).

5

2.2.1 Firewalls limitations

Even though firewalls are necessary for security, they have some limitations. The

following are some limitations of firewalls (Stallings, 2010):

• If an attack bypasses the firewall, the firewalls protection is void.

• Firewalls do not provide adequate protection from threats that can occur inter-

nally. For instance, an employee with malicious intent or an employee who

unknowingly aids an external attacker via social media.

• A firewall cant protect against wireless communication between local systems

on varying sides of the internal firewall. This is concerning if in the event a

poorly secured LAN can be compromised from outside the organization.

• An infected device (employee devices, laptops, portable devices, etc.) used on

the corporate network completely bypasses firewall security.

An Intrusion Detection system kicks in if and when the firewall fails. IDS will

evaluate an intrusion once it happens. IDS is specifically programmed to prevent and

find attacks that are missed by firewall filters (Firewalls, 2015).

2.3 IDS Definition

Intrusion Detection Systems (IDS) are designed to inspect all network activity and

identify incoming and outgoing patterns that are suspicious. A type of packet scanner,

IDS scans all packets on the network and classifies inbound and outbound traffic as

intrusive or not intrusive. For instance, Denial of Service (DoS) attacks, disclosures,

manipulations, and masqueraders are some examples of an intrusion.

Cyber attacks on systems can either fail or succeed. Intrusion detection systems are

designed to monitor targeted systems and collect the audit trails, analyze the gathered

information for signs reflecting unusual activity and misuse, automatically respond to

detected activity and mitigate damages, generate reports about questionable activity

6

Figure 2.1: IDSs Classification Dimensions (Pathan, 2014).

and send out notifications,and discover and diagnose problems. Intrusions are bro-

ken down to three varying types, host intrusions, network intrusions and application

intrusions.(Liu, 2012)

2.4 IDS Classification

IDSs could be classified in multiple dimensions based on detection method, architec-

ture and their post detection action (Pathan, 2014). A complete categorization can be

seen in Figure 2.1. In this research we are interested in network anomaly detection

method.

2.4.1 Detection Methods

A. Misuse IDS

Misuse (Signature) Intrusion Detection Systems work like a virus scanner way.

Relying on rules, a Signature Detection System will try correlate likely patterns

to intrusion attempts. In order to gain access to a system, viruses try a number

of steps in a particular pattern. These specific steps are made into a customized

rule and when the Intrusion Detection System compares collected data versus

7

Figure 2.2: Signature based IDS deployment (Gadbois, 2011).

Figure 2.3: Anomaly based IDS deployment (Gadbois, 2011).

observations it decideswhether it is positive or negative. Figure 2.2 shows the

deployment of signature based IDS.

B. Anomaly IDS

Anomaly detection consists of a baseline profile that is set by the IDS or a

network administrator. This established baseline informs the system of normal

network traffic and can flag any deviation as an attack (Lokesak, 2008). Figure

2.3 shows the deployment of anomaly based IDS.

C. Anomaly vs. Signature IDS

8

Signature based detection offers more accuracy, time savings, and detailed log

files (Lokesak, 2008). In identifying intrusion detection attempts, signature

based detection is more accurate. Furthermore, because of this increased ac-

curacy, administrators spend far less time on false positives (Lokesak, 2008).

Because this form of detection has detailed log files, it is easier to identify the

cause of alarm. However, there are some downsides. Signature based detection

systems respond to only what is in their database and requires constant updates

(Lokesak, 2008). Additionally, when new viruses are discovered, it may take

hours to days until updates are implemented. Systems can also become sluggish

if hardware isn’t updated and managed. On the other hand, anomaly based de-

tection detects new threats without an administrator’s updates (Lokesak, 2008).

It also learns about network activity and creates profiles on an ongoing basis. So,

the longer this system is implemented the more accurate it becomes. However,

this advantage creates the disadvantage of being unprotected during its profile

building. Furthermore, if an intrusion or attack looks like normal activity to the

system, an alarm is not triggered. Anomaly based detection is also more prone

to sending out false positives (Lokesak, 2008).

2.4.2 IDS Architecture

A. Host Based IDS

Host Based Intrusion Detection Systems (HIDS) focus on collecting and ana-

lyzing information on a specific host or system. While relying heavily on audit

trails and system logs for identifying unauthorized access, HIDS checks and

collects system data from file systems, network events and system calls (Scar-

fone and Mell, 2007). There are two types of HIDS: anomaly detection and

signature based detection which are described above.

B. Network Based IDS

Network Based Intrusion Detection Systems (NIDS) provide real time monitor-

9

Figure 2.4: Network based IDS (Gadbois, 2011).

ing of networks(Scarfone and Mell, 2007).. If there were to be an intrusion,

this system can detect attacks as they happen. NIDS allows direct analysis of

network traffic where all the network traffic is seen on all levels of the operating

system and does not degrade network or host performance. For instance:

• Its first job is to record each incoming or outgoing packet leaving a packet

trace.

• Its second job is to analyze each packet trace to identify a matching attack

signature (Scarfone and Mell, 2007).

Figure 2.4 shows a network architecture with NIDS.

C. Network Based vs. Host Based IDS

A NIDS has some advantages over HIDS. It is physically a separate unit and

it does not take anything from the system. Overall, NIDS is good for detect-

ing unauthorized access, bandwidth theft or DoS attacks (Jessica, 2007). The

disadvantages of NIDS can be that administrators lack in implementing an ap-

propriate plan for traffic growth causing NIDS to overload and drop packets thus

defeating its purpose. NIDS is also susceptible to slow attacks (Jessica, 2007).

HIDS has the ability to use logs, system services, registry events, and etc that

10

are on the system. However, HIDS may detect an attack too late. HIDS also

uses system resourcesdue to the fact that it is running on the host (Jessica,

2007). Ideally, both NIDS and HIDS complete each other and should both be

implemented.

11

Chapter 3

Data Mining Techniques

As the world grows in complexity, overwhelming us with the data it produces, data

mining becomes our only hope for discovering hidden knowledge. DM is defined as

the process of discovering patterns in data (Witten et al., 2011).

3.1 What is Data Mining

Data Mining is defined as the process of extraction of interesting (non-trivial, implicit,

previously unknown and potentially useful) patterns or knowledge from huge amount

of data (Han et al., 2012). Data Mining is the core of the Knowledge Discovery of Data

(KDD) process, involving the inferring of algorithms that explore the data, develop the

model and discover unknown patterns (Maimon and Rokach, 2010). There are many

methods of Data Mining used for different purposes and goals. Figure 3.1 presents

DM methods taxonomy (Maimon and Rokach, 2010).

3.2 Data Mining and IDS

Recently, there is a great interest in the application of Data Mining techniques to intru-

sion detection systems. The problem of intrusion detection can be reduced to a Data

Mining task of classifying data. Briefly, one is given a set of data points belonging to

different classes (normal activity, different attacks) and aims to separate them as accu-

12

Figure 3.1: Data Mining methods taxonomy (Maimon and Rokach, 2010).

rately as possible bymeans of a model (Maimon and Rokach, 2010). Many different

data mining techniques exist for intrusion detection classification. Researchers tried

to use distinctive methods to get better accuracy of data classification. In this research

we are using three data mining techniques: Decision Tree, Artificial Neural Network,

and Support Vector Machine.

3.3 Decision Tree

Decision tree is one of the most well known and used classification algorithms:

• Decision tree algorithm known as ID3 (Iterative Dichotomiser), was known

since 1970.

• A Classification and Regression Trees (CART) which was used to generate bi-

nary decision trees as presented in (Breiman et al., 1984).

• C4.5 algorithm was presented later by Quinlan (Quinlan, 1993; Han et al.,

2012). C4.5 became a benchmark to which newer supervised learning algo-

rithms are often compared.

ID3, CART, and C4.5 adopt a greedy approach in which decision trees are constructed

in a top-down recursive divide-and-conquer way (Han et al., 2012). Unlike ID3; C4.5

13

can deal with continuous attributes and handles missing values, but alittle slower than

the other DT algorithms(Ooi et al., 2013).

3.3.1 How to develop a Decision Tree

Decision tree is a directed tree, conforms its structure by recursively separates the set

of observations. It consists of a root with no incoming edges, internal or test nodes

with exactly one outgoing edge for each, and leaves which represent the decision

node and have no outgoing edges (Maimon and Rokach, 2010). The decision tree

development algorithm is a greedy algorithm which is a top-down recursive divide-

and-conquer in nature. The algorithm can be summarized as follows (Kargupta et al.,

2008):

Algorithm 1: Generate-Decision-Tree(samples,att-list)1: Input:2: Samples : training samples3: att-list: set of candidate attributes4: Createa node N // represent the training samples5: If samples are all of the same class, Cthen6: return N as a leaf node labeled with class C;7:8: If att-list is emptythen9: return N as a leaf node labeled with the most common class in samples;

10:11: Selecttest-attribute, the attribute among attribute-list with the highest12: information gain based the Entropy;13: Label node N with test-attribute;14:15: for each known valueai of test-attributedo16: Let si be the set of samples for which test-attribute=ai;17: If si is emptythen18: attach a leaf labeled with the most common class in samples;19: elseattach the node returned by Generate-Decision-Tree(si,att-list)20: end if21: end for

To reduce tree complexity, pruning algorithms were presented. Pruning is agen-

eral technique to go against over fitting has a huge effect on the tree size, and a slight

effect on the accuracy. It results in better accuracy as reported in (Witten et al., 2011).

14

Using Decision Tree, network connections can be classified as normal, anomaly, or

other predefinedtypes of attack.

3.3.2 How to Select Tree Root?

We want to determine which attribute can work as a root of a tree given a set of

training feature vectors. Information gain (IG) define how important certain attribute

of the feature vectors is. IG helps deciding the ordering of attributes in the nodes of

a decision tree. Equations 3.1 and 3.2 show how entropy and information gain are

calculated (Han et al., 2012).

IG = E(Parent)− AE(Children) (3.1)

Entropy =∑

i

−pi log2 pi (3.2)

E, AE are the entropy and the average entropy, respectively.pi is the probability of

classi. Entropy comes from information theory. The higher the entropy the more the

information content. For example, given a training data set in Table 3.1. The table has

three featuresf1, f2 andf3 and the two classesA andB. Assuming thatf1 is the split

best attribute, this node would be further split.

Table 3.1: Example data set

f1 f2 f3 Class1 1 1 A1 1 0 A0 0 1 B1 0 0 B

Thus, the entropy of children and the gain can be computed as follows:

Echild1 = −1

3log2(

1

3)−

2

3log2(

2

3)

= 0.5284 + 0.39

= 0.9184

Echild2 = 0

15

Figure 3.2: Simple Tree Structure

Eparent = 1

IG = 1−3

4× (0.9184)−

1

4× (0)

= 0.3112

If we split using the featuref2, we get the following:

Echild1 = 0

Echild2 = 0

Eparent = 1

IG = 1−1

2× (0)−

1

2× (0)

= 1

Splitting using featuref2 shall produce the best gain. The developed tree structure

in this case can be presented as in Figure 3.2. This tree was developed using Weka

software (Hall et al., 2009).

3.4 Artificial Neural Network

Classification is one of the most active research and application areas of neural net-

works. A classification problem arises when an object needs to be allocated into a pre-

defined group or class based on a number of observed attributes associated to that ob-

16

ject. ANN was successfully used to handle multi-class pattern classification problem

(Zhang, 2000;Ou and Murphey, 2007), medical diagnosis (Brause, 2001), bankruptcy

prediction (du Jardin, 2010), handwritten character recognition (Singh et al., 2009;

Chaturvedi et al., 2014), and speech recognition (Krol and Szlachetko, 2010).

ANN usually consists of many hundreds of simple processing units which are

connected together in a complex communication network. Each unit or node is a

simplified model of a real neuron which fires (sends off a new signal) if it receives

a sufficiently strong input signal from the other nodes to which it is connected. The

strength of these connections may be varied in order to make the network perform

different tasks corresponding to different patterns of node firing activity. ANN model

consists of a set of synapses each of which is characterized by a weight or strength of

its own.

3.4.1 Perceptron

Neuron is the basic processing unit in ANN. Each neuron has number of inputs and a

single output. Each input has an assigned factor or parameter called theweight. The

way how a neuron works, is as follows: an input signal to each neuron is multiplied

by the corresponding weight then the result from the multiplication is summed up and

passes through a transfer function. This transfer function is most likely to be a sigmoid

function (see Equation 3.3) (Quinlan, 1993). The most simple neural network unit is

called ”Perceptron” (see Figure 3.3) (Quinlan, 1993). If the result of the summation is

over a certain threshold, the neuron output will be activated otherwise not.

f(x) =1

1 + e−x(3.3)

For example,given a set of inputsxj and a set of corresponding weightswj, the output

of the neuron is calculated by the following function:

yi = f(n∑

j=1

wjxj + w0) (3.4)

17

Figure 3.3: The simple perceptron architecture

Figure 3.4: Proposed MLP architecture

3.4.2 Multi-layer Perceptron (MLP)

ANN consists of three layers named as: input layer, hidden layer, and output layer.

Neurons are most likely fully connected. Each connection is signified by a weight.

This weight is computed based on what is called a learning algorithm. These neurons

are grouped together to form a layer.

MLP is a fully connected network because all inputs/units in one layer are con-

nected to all units in the following layer. The input layer gets the initial data, the hid-

den layer calculates several interim values which are used to calculate output values

in the output layer. The MLP can be represented mathematically as given in Equation

3.5 (Norgaard et al., 2000; Al-Hiary et al., 2008):

18

yi = gi[Φ, θ]

= Fi

[

nh∑

j=1

Wi,jfj

(

nΦ∑

l=1

wj,lΦl + wj,0

)

+Wi,0

]

(3.5)

whereyi is the output signal,gi is the function realized by the neural network and

θ specifies the parameter vector, which contains all the adjustable parameters of the

network (weightswj,l, and biasesWi,j), nh nodes in the hidden layer. MLP is trained

by using the backpropagation (BP) learning algorithm. Training means adjusting the

network weights such that the objective criteria is minimized (i.e. minimize the error

difference between the network outputy and the inputΦ).

The ANN achieves a good match when the Mean Square Error (MSE) is mini-

mized (See Equation 3.6) (Tim, 2015). Figure 3.4 shows the architecture of MLP with

41 inputs which are the features of NSL-KDD and six outputs which are the types of

attacks. We used MLP to detect the six types of attacks available, in our data samples.

MSE =1

n

n∑

i=1

(yi − yi)2 (3.6)

3.5 Support VectorMachines

Support Vector Machines (SVMs) are one of the latest development of supervised

machine learning technique (Ng, 2014). A survey of SVMs can be found in (Burges,

1998; Cristianini and Shawe-Taylor, 2000). Although SVM were known since late

seventies (Burges, 1998; Vapnik, 1982), it started to receive attention in late nineties

(Burges, 1998). It was applied basically to pattern recognition, also used for pat-

tern classification problems like image recognition, text recognition, face detection,

etc (Pradhan, 2012). However, many researchers implemented SVM techniques in

solving intrusion detection problem such as in (Khan et al., 2007; Jiang et al., 2011;

19

Figure 3.5: Left: the margin for a decision boundary is the distance to the nearestdatapoint. Right: In SVMs, we find the boundary with maximum margin. (Figure fromPattern Recognition and Machine Learning by Chris Bishop.)

Sujatha et al., 2012; Jha and Ragha, 2013). SVMs work mainly by deriving a hyper

plane that maximizes the separating margin between two classes (Hu et al., 2003).

The feature vectors that lie on the boundary of separation vectors are called support

vectors(Hu et al., 2003). SVMs are fantastic because they are very resilient to over

fitting (Witten et al., 2011).

3.5.1 How SVM Works

To see how SVM works, assume we are having a set of training examples in a pair

format (xi, yi), i = 1, . . . , l wherexi ∈ Rn andy ∈ {1,−1}l. Thus, our objective is

to learn a classifier:

f(x) = wTφ(x) + b (3.7)

The classifier’s output for a newx is sign(f(x)). If the training data are linearly-

separable in the feature space ofφ(x) (See Figure 3.5), the two classes of training

examples are sufficiently well separated in the feature space that one can draw a hy-

perplane between them. SVM maps the training vectorxi into a higher dimension

space using the functionφ by finding linear separator hyperplane with the maximum

margin. ζ > 0 is a penalty coefficient for the error term. We need to maximize the

margin (i.e. the distance from the hyperplane to the closest data point in either class)

such that we maximize the margin of error.

20

Many data sets might not be linearly separable. This means that there will beno

solution which could satisfy all the constraints. One way to handle this problem is

to release some of the constraints by introducing slack variables. Slack variables are

presented to permit certain constraint to be violated. It means that, certain training

points could be within the margin. Our objective is to minimize the number of points

within the margin as much as possible. In this case, the SVM (Boser et al., 1992;

Cortes and Vapnik, 1995) requires the solution of the following optimization problem:

minw,b,ζ

∑N

i=1ζi +

1

2||W ||2

∀i yi(wTφ(xi) + b) ≥ 1− ζ

ζi ≥ 0 (3.8)

K(xi, yi) ≡ φ(xi)Tφ(xi) is called the kernel function. Nowadays, many kernels

were proposed for the SVM. Some are listed below:

• linear:K(xi, yi) = xTi xj

• polynomial:K(xi, yi) = (γxTi xj + r)d > 0

• radial basis function (RBF):

K(xi, yi) = exp(−γ||xi − xj||2), γ > 0

• sigmoid:K(xi, yi) = tanh(γxTi xj + r)

whereγ, r, andd are kernel parameters. Slack variables characteristics with vari-

ous values are shown in Figure 3.6.

3.6 Summary

In this chapter, some definitions of data mining were introduced. The way how the

three algorithms: C4.5, MLP, and SVM work was explained in details.

21

Figure 3.6: The slack variablesζ ≥ 1 for misclassified points, and0 < ζ < 1 forpoints close to the decision boundary. (Figure from Pattern Recognition and MachineLearning by Chris Bishop.)

22

Chapter 4

Related Work

In thepast, various aspects of anomaly based intrusion detection in computer security

using machine learning were explored (Liao, 2005). A Review of Intrusion detection

solution using machine learning was presented in (Tsai et al., 2009). This work pre-

sented a revision for 55 related research studies between 2000 and 2007 focusing on

developing single, hybrid, and ensemble classifiers. Classification based unsupervised

and supervised ML techniques in detecting intrusions using network audit trails was

presented in (Mukkamala et al., 2006). The authors investigated well known both the

Frequent Pattern Tree mining (FP-tree), classification and regression trees (CART),

multivariate regression splines (MARS) and TreeNet for solving ID problem. Clas-

sification accuracy based the Receiver Operating Characteristic (ROC) curve analysis

was used to measure the performance of each developed model. The results show that

classification accuracies are better in the cases of SVM and ANN.

Recently, ten machine learning approaches are used to detect network intrusions

using the NSL-KDD data set (Panda et al., 2011). They include Decision Tree J48,

Bayesian Belief Network, Hybrid Naıve Bayes with Decision Tree, Rotation Forest,

Hybrid J48 with Lazy Locally weighted learning, Discriminative multinomial Naıve

Bayes, Combining random Forest with Naıve Bayes and finally ensemble of classifiers

using J48 and NB with AdaBoost AB. Intrusion detection on mobile ad hoc networks

(MANETs) is a challenging process. The reason is because of their dynamic nature,

and their highly resource-constrained nodes. In (Sen and Clark, 2011), the author

23

explored the use of Evolutionary Computation (EC) techniques, specifically Genetic

Programming (GP) andGrammatical Evolution (GE), to evolve intrusion detection

programs. In (Giray and Polat, 2013) the authors made a comparison using three

variations of KDD99, NSL KDD and noisy added data sets. They used WEKA to

compare the performance of eleven classification algorithms including Decision Trees

(DT), Random Forest (RF), Multi-layer Perceptron (MLP), Voted Perceptron (VP),

Bayesian Networks (BN), Naive Bayes (NB) , etc. The conclusions, for the most part,

shows that the performance of various algorithms without noise is not the same as in

the real noisy environment.

4.1 IDS Using Artificial Neural Network

In 1998 Cannady (Cannady, 1998) used a multi layer perceptron (MLP) of four fully

connected layers, nine inputs which represent the data stream features, and two outputs

(0,1) 0 for normal 1 attack class. The objective was to test the ability of MLP to detect

the potential misuse data stream. The model was trained using 9,462 records, 1000

records were selected for testing. The results were measured using root mean square

error and correlation. The results showed the ability of MLP to be used in the IDS for

misuse detection.

ANN were used to deal with intrusion detection problem in (Mohammed et al.,

2007), the proposed model was able to identify three classes of attacks: normal and

two other attack types. The developed ANN model achieved high accuracy. Authors

suggested including more attack scenarios in the data set, they also suggested reducing

the number of records as a trial to minimize the complexity of the system.

Another ANN model was proposed in (Barman and Khataniar, 2012). Authors

defined the output of the ANN to be either 1 or 0 based on the fact that the packet

is infected or not with intrusion. They explored the issue of reducing the domain of

feature set by using rough set theory performed on just one type of attack. The authors

claimed that their model was 20.5 times faster than the previous ones. They suggested

applying their method on other classes of attack as a future work.

24

In (Sahilpreet and Meenakshi, 2013), the authors presented four different algo-

rithms to develop intrusion detection models. They include the MLP, Radial Base

Function (RBF), Logistic Regression (LR) and Voted Perception (VP). All these al-

gorithms were implemented in WEKA (Hall et al., 2009), a software for data mining,

to evaluate the performance. NSL-KDD data set was used. To enhance their results,

feature reduction techniques were applied. The results showed that the MLP network

algorithm provided more accurate results than other algorithms. As a future work,

integrated MLP Network with fuzzy inference rules to improve the performance was

suggested.

4.2 IDS Using Support Vector Machine

Yao et. al. (Yao et al., 2006) proposed an enhanced SVM model for intrusion de-

tection, they used rough set theory to reduce the number of features by removing the

less weighted ones. They evaluated the proposed model using KDD99 and UMN data

sets against precision, recall, false positives, and false negatives criteria. The results

showed that their model was more accurate and needs less time to perform.

Chen et al. (Chen et al., 2009) proposed a model for IDS using SVM based system

on a Rough Set Theory (RST). RST was used to reduce the number of features from

41 to 29. The authors compared RST based SVM with that of a full features and

Entropy. Their proposed RST-SVM model resulted in a better accuracy compared to

the other two mothods.

An integrated model of SVM model and DT model for multiclass classification

proposed in (Mulay et al., 2010). First they separated the classes by binary tree struc-

ture, then each class were fed to a number of SVMs as the number of the classes. The

authors supposed that by combining the two models the results will be more accurate,

and the classification process will be faster than individual models. But they didn’t

prove or simulate their model.

A comparison between three types of Support Vector Machine (SVM) kernel func-

tions: Gaussian Kernel (RBF), Polynomial Kernel, and Sigmoid Kernel were imple-

25

mented in (Bhavsar and Waghmare, 2013). Cross validation test mode was used.The

results showed that RBF kernel function can overcome the drawback of SVM i.e ex-

tensive time needed for model building.

4.3 IDS Using Decision Tree

Farid et al. (Farid et al., 2010) proposed a new learning algorithm for anomaly base

IDS using DT. Their method modified the splitting weights of the dataset. Their

method involved changing the weights relative to posterior probabilities. The results

of their work illustrate a better performance than the traditional DT algorithm.

An ensemble neural decision tree was used in (Sivatha Sindhu et al., 2012) for

feature selection and model reduction. The proposed model was compared to 6 types

of decision trees. They used specificity and sensitivity as evaluation metrics. The

results showed that the proposed model performed better than other methods.

In (Ooi et al., 2013), three types of decision trees: ID3, C4.5, and BFS were tested

on NSL-KDD network intrusion data set. Feature selection was performed using Con-

sistency Subset Evaluator (CSE). NSL-KDD data set and 10-fold cross validation test

mode were used to train and test the three DT algorithms. The analysis of the re-

sults concluded that C4.5 performs better than BFS and ID3 in terms of prediction

accuracy. Also, they used the ROC curve as evaluation criteria. Higher values in

area under curve of ROC denote that the classifier has higher ability to classify the

randomly chosen instance correctly.

Nadiammai et al. (Nadiammai and Hemalatha, 2014) proposed four solutions for

different IDS problems, they included the problem of data classification, high level

of human effort, unlabeled data, and distributed denial of service attack effectiveness.

They solved the first problem (classification of data) using Efficient Data Adapted

Decision Tree (EDADT). The objective of this method was to minimize the dimen-

sionality of model by feature extraction of relevant features to every type of attack.

The authors compared the proposed algorithm to other methods like C4.5, SVM, and

others. The results they obtained show that their algorithm achieved the highest accu-

26

racy rate.

4.4 IDS Using Feature Selection

Using too many features will result in huge feature space. Which leads to slow down

model learning process, and may decrease accuracy. Usually, there are many redun-

dant or irrelevant features, so using feature selection is a good idea to remove these

redundant or less discriminative features (Han et al., 2012).

Sivatha Sindhu et al. (Sivatha Sindhu et al., 2012) improved the genetic algorithm

by formulating a new fitness function to search the best relevant features from the 41

KDDcup’99 features. The objective of feature selection was to reduce the computation

complexity of the classifier. The proposed algorithm was compared to various combi-

nations of feature selection algorithms: Genetic Search, Greedy Stepwise, Ranker and

RankSearch. The accuracy percentage was close but the number of features selected

by the proposed algorithm was less. So, the detection time is less compared to the

other algorithms.

Studying the relevance between the 41 features and the attack types was studied

in (Kayacik et al., 2005). The author concluded that not all the 41 features are needed

to classify types of attacks. They recommended that more studies are required in the

scope of machine learning algorithms.

27

Chapter 5

Experimental Setup and Results

In this thesis, we adopted three classification algorithms to develop set of models for

intrusion detection. MLP, C4.5, and SVM classifiers were trained and tested using

Waikato environment for knowledge analysis (Weka) (Hall et al., 2009).

Weka is a collection of machine learning algorithms used for data mining tasks. It

is an open source software contains tools for data pre-processing, regression, classifi-

cation, clustering, association rules. It also has visualization (Hall et al., 2009).

The improved version of KDDCUP’99; NSL-KDD data set was used to form the

experimental data set. For all experiments we used 10-fold cross validation test mode.

10-fold cross validation test mode is preferred since it reduces the variance of estima-

tion (Witten et al., 2011). The experiments scenario will be explained in details in the

following subsections.

5.1 Experimental Data Set

Intrusion Detection Evaluation Program (IDEP), administered by the Lincoln Labora-

tory at the Massachusetts Institute of Technology, was funded by the United States De-

fense Advanced Research Projects Agency (DARPA) in 1998. This program’s main

object was to build a data set that would help evaluate different intrusion detection

systems (IDSs). KDDCUP’99 data set was a result of seven weeks training and two

weeks testing data of this program (Sabhnani and Serpen, 2004).

28

5.1.1 KDDCUP’99 Data Set

KDDCUP’99 is the mostwidely used data set for ID research, publicly available at

(Lichman, 2013). It contains about 4,900,000 connection records. Each record con-

sists of 41 features.

There are four major categories of attacks in the KDDCUP’99 data set:

1. Probing: information gathering attacks.

2. Denial of Service (DoS): deny legitimate requests to a system.

3. User-to-Root (U2R): unauthorized access to local super-user or root.

4. Remote-to-Local (R2L): unauthorized local access from a remote machine.

A statistical analysis on this data set was proposed by Tavallaee et al. (Tavallaee

et al., 2009). Some important problems that greatly affected the performance of eval-

uated systems were found. For example: it contains a very huge number of redundant

records, and the difficulty level of the different records was not inversely proportional

to the percentage of records in the original KDDCUP’99 data set. These deficits re-

sults in a very poor evaluation of different ID proposed techniques.

Many machine learning and pattern classification algorithms were used to process

the intrusion detection problem based on the KDDCUP’99 data set and failed to iden-

tify most of the user-to-root and remote-to-local attacks. The authors in (Sabhnani and

Serpen, 2004), introduced the deficiencies and limitations of the KDDCUP’99 data set

to argue that this data set should not be used to train pattern recognition or machine

learning algorithms for misuse detection for these two attack categories. Because

their experiments showed that it is not possible for any trainable pattern classification

or machine learning algorithm to reach an acceptable level of misuse detection per-

formance on the KDD testing data subset if classifier models are built using the KDD

training data subset for these categories (Sabhnani and Serpen, 2004).

29

5.1.2 NSL-KDD

NSL-KDD data set was suggested to solve some of the inherent problems of the KDD-

CUP’99 data set. The proposed new data set (NSL-KDD) consists of selected records

of the complete KDDCUP’99 data set and it recovers these problems (Tavallaee et al.,

2009). The following are some of advantages of the NSL-KDD over the original KD-

DCUP’99 data set:

• Redundant records in the training and testing set were removed.

• The number of selected records from each difficulty level group is inversely

proportional to the percentage of records.

• It consists now of reasonable number of instances in the training set and testing

set. So it is affordable to use NSL-KDD dataset for experiments.

NSL-KDD data contains 125,973 records, each record consists of 41 features. The

features, their descriptions and types are shown in Appendix A.1. The records are

instances of network attacks including 23 classes: normal and 22 types of attacks:

neptune, warezclient, ipsweep, portsweep, teardrop, nmap, satan, smurf, pod, back,

guesspasswd, ftpwrite, multihop, rootkit, bufferoverflow, imap, warezmaster, phf,

land, loadmodule, spyand perl. These types represent the 4 main categories mentioned

in section 5.1.1.

5.1.3 Class Distribution

As mentioned before in section 5.1.1; there are 4 main attack categories. The number

of attack records (i.e class distribution) in each attack category differs in wide range

as shown in Table 5.1. This distribution has an effect on classifier learning (Weiss and

Provost, 2001). In this work, we have selected an equal number of attack records per

attack type. We selected randomly 6000 records from NSL-KDD data. The selected

set contains 5 types of attack and normal type, 1000 records for each type. Table 5.2

shows the type of data used and the number of samples for each attack type.

30

Table 5.1: Distribution of attack records per attack category of the NSL-KDD.

Attack Category Attack Name Number of Records Percentage of total %Back 956Land 18Neptune 41214Pod 201Smurf 2646teardrop 892

DoS 45927 36.46Satan 3633Ipsweep 3599Nmap 1493Portsweep 2931

Probe 11656 9.25GuessPassword 53Ftp write 8Imap 11Phf 4Multihop 7Warezmaster 20Warezclient 890Spy 2

R2L 995 0.79Buffer overflow 30Loadmodule 9Rootkit 10Perl 3

U2R 52 0.04Normal 67343 53.46Total 125973

Table 5.2: Experimental data

Attack type No. of recordsnormal 1000ipsweep 1000neptune 1000nmap 1000smurf 1000satan 1000Sum 6000

31

5.2 Classification Models Setup

5.2.1 C4.5

C4.5/J48 is avery popular machine learning algorithm. It is a new variant of ID3 al-

gorithm. The output of this classification algorithm is an understandable tree. To get

the tree small as possible information gain during building the tree is used. Pruning,

which is the process of reducing the size of the tree also has been used to get smaller

tree. Pruning also reduces the classifier complexity and improves the prediction ac-

curacy (Witten et al., 2011). Without pruning we get a tree of 456 nodes and 400

leaves. The classification accuracy computed was 99%. Using pruning we get tree of

229 nodes size and 188 leaves and 99.05% classification accuracy. Confidence factor

of 0.25 was used. The confidence factor used for pruning (smaller values mean more

pruning). Weka setup for building C4.5 model is shown in Figure 5.1. Using data flow

environment the model setup goes through the following steps:

1. Loading the data file through selecting the file loader component.

2. Specify the class attribute using the class assigner component.

3. Selecting the training and testing mode by choosing cross validation fold maker.

4. Attaching those components to C4.5 classifier, in WEKA called (J48).

5. Assigning the evaluation component called classifier performance evaluator.

6. Showing the the performance results via selecting the text viewer.

5.2.2 ANN (MLP)

MLP is a feed forward artificial neural network. It consists of 3 or more layers: 1)

input layer, 2) one or more hidden layer(s), and 3) output layer. In MLP we need to

properly find the best number of hidden layers and the best number of neurons in that

hidden layer. The best efficiency found when using one hidden layer, and the number

32

Figure 5.1: C4.5 classification model structure.

of nodes in thehidden layer is the average of input and output nodes (Witten et al.,

2011). MLP algorithm with the learning rate of 0.3, momentum: 0.2, No. of Epochs:

500, validation threshold: 20. Table 5.3 shows some of the MLP parameters and their

meaning. The default Weka parameter values have been used. The structure of the

MLP was shown in Figure 3.4. Weka Setup for the MLP model is shown in Figure

5.2.

Table 5.3: MLP parameters and their meaning

Parametr Value Explanation

learning rate 0.3 The amount the weights are updatedMomentum 0.2 Momentum applied to the weights during updatingNo. of Epochs 500 The number of epochs to train throughvalidation threshold: 20 Used to terminate validation testing

5.2.3 SVM

SVM is a lineardecision boundary, but can get nonlinear and more complex bound-

aries by replacing the dot product in the support vector formulation by a kernel func-

tion. We explored: Gaussian Kernel (Radial Basis Function), Polynomial kernel, and

Sigmoid kernel. Radial Basis Function (RBF) was selected because it overcomes the

33

Figure 5.2: Weka Setup for the MLP classification model.

drawback of SVMi.e extensive time needed for model building (Bhavsar and Wagh-

mare, 2013). SVM algorithm with RBF kernel function was trained, cahe size: 40.0,

cost parameter C:1.0, Eps: 0.001. Table 5.4 shows some of SVM parameters and their

meaning. The default Weka parameter values have been used. Weka setup for the

MLP model is shown in Figure 5.3.

Table 5.4: SVM parameters and their meaning

Parametr Value Explanation

Cahe size 40 The cache size in MB, the size of the kernel cachehas a strong impacton run times for larger problems

Cost parameter 1 The C parameter tells the SVM optimization how muchyou want toavoid misclassifying each training example

Eps 0.001 The tolerance of the termination criterion

5.3 10-fold Cross Validation

In 10-fold cross validation training and testing mode, the data is randomly divided

into 10 parts in which the class is represented in approximately the same proportions

as in the full dataset. Each part is held out in turn and the learning scheme trained on

the remaining nine parts; then its error rate is calculated on the holdout part. Thus,

34

Figure 5.3: Weka Setup for SVM classification model.

the learning procedure isexecuted a total of 10 times on different training sets (each

set has a lot in common with the others). Finally, the average of 10 error estimates

is calculated to obtain an overall error estimate. Why 10? Extensive tests on numer-

ous different datasets, with different learning techniques, have shown that 10 is about

the right number of folds to get the best estimation of error, and there is also some

theoretical proofs that support this hypothesis (Witten et al., 2011).

For all experiments in this thesis, 10-fold cross validation training and testing

mode was used because it reduces the variance of estimate (Witten et al., 2011).

5.4 Feature Selection

Feature selection was successfully used to enhance the process of modeling for input

output system (Papadakis et al., 2005). In many cases of modeling, various attributes

are gathered during data collection process although they might not be significant. The

more irrelevance data might increase the model complexity and increase the conver-

gence time of the best model structure (Witten et al., 2011).

Feature selection was defined as the process of selecting a subset of originally

defined features based on a pre-defined evaluation criteria (Han et al., 2012; Hall,

35

1999). Feature selection was frequently used for model dimension reduction. Feature

selection helpsreducing the features domain, removes redundant features. This way

will help in speeding up a learning/modeling process (Han et al., 2012; Hall, 1999).

Studying the relevance between the 41 features and the attack types was studied in

(Kayacik et al., 2005). The authors concluded that not all the 41 features are needed

to classify types of attacks. They recommended that more studies are required based

machine learning algorithms. In (Ooi et al., 2013), three types of decision trees: ID3,

C4.5, and BFS were tested on NSL-KDD network intrusion data set. Feature selection

was performed using Consistency Subset Evaluator (CSE). The analysis of the results

concluded that C4.5 performs better than BFS and ID3 in terms of detection accuracy.

Main steps for feature selection process can be summarized as follows (See Figure

5.4):

1. Generation procedure to generate the next candidate subset.

2. Evaluation function to evaluate the subset.

3. Stopping criterion to decide when to stop.

4. Validation procedure to check whether the subset is valid.

Different methods for attribute search and evaluation were analyzed in (Megha and

Amrita, 2013). We selected Best First and Genetic Search algorithms with Correlation-

based Feature Selection evaluator because their performance was better than the other

methods based on Aggarwal study (Megha and Amrita, 2013).

✲ ✲

��

❅❅❅

❅❅❅

��

❄

✲

✻

SubsetEvaluation

Stopping

CriterionResultValidation

YesNo

OriginalSet

SubsetGeneration

Figure 5.4: Main steps of feature selection process (Megha and Amrita, 2013).

36

5.5 Search Space Complexity

For the benefits offeature selection mentioned in section 5.4, it is a primary com-

ponent of classification models in this work. Studying the search space size, forn

features there are2n subsets can be formed and tested (Petra, 2012) i.e241 which is

equal to 2,199,023,255,552 combinations. In addition, before using FS techniques;

we do not know which are the most relevant features? how much are they? There-

fore, it is necessary to test all2n possible feature subsets for training models (Petra,

2012). The following calculations show how much the feature selection will reduce

our domain of search: The original space of search is equal to:

(

n

1

)

+

(

n

2

)

+ ...+

(

n

n

)

For example, selecting 7 features out of 41 will reduce the domain of search to

(

41

7

)

= 22, 481, 940

the reduction ratio = 22,481,940 / 2,199,023,255,552

= 1.022*10E-5

5.5.1 Best First Search

Best first search strategy allows backtracking along the search path. It moves through

the search space by making local changes to the current feature subset. If the path

being explored begins to look less promising, best first search can back-track to a

more promising previous subset and continue the search from there. Best first search

algorithm works as follows:

The selected features by BFS algorithms are shown in Table 5.5.

37

Algorithm 2: Best first search algorithm(Hall, 1999).1: Begin with the OPEN list containing the start state, the CLOSED list empty,2: andBEST ←start state.3: Let s = arg max e(x) (get the state from OPEN with the highest evaluation).4: Remove s from OPEN and add to CLOSED.5: If e(s) ≥ e(BEST ), thenBEST ←s6: For each child t of s that is not in the OPEN or CLOSED

list, evaluate and add to OPEN.7: If BEST changed in the last set of expansions, goto 3.8: Return BEST.

Table 5.5: BFS Selected Features

No. Description Type3 service symbolic5 srcbytes continuous6 dstbytes continuous23 count continuous30 diff srv rate continuous37 dsthost srv diff host rate continuous38 dsthost serrorrate continuous

5.5.2 Genetic Search

Genetic Algorithms (GA) aresearch algorithms adopting the principle of natural se-

lection (Hall, 1999; Sharma et al., 2014). Using GA, robust and adaptable systems can

be developed (Sharma et al., 2014; Kumar and Punia, 2013). GA works on an individ-

ual called chromosome. Initial population is a set of randomly created chromosomes.

Each chromosome represents a possible solution to the problem (Sazzadul Hoque

et al., 2012; Sharma et al., 2014). The generated solutions evolve over time to produce

an optimal solution in an iterative process. In feature selection problem, a solution

usually is a fixed length binary string representing a feature subset. Each position

value in the string represents the presence or absence of a particular feature (Hall,

1999). Initial subset is selected randomly from the all features set. Successive gener-

ations are produced using genetic operators called crossover and mutation applied on

the current selected subset. The new generated subset members are evaluated using

what is called fitness function according to defined fitness criteria. The better subsets

have a stronger chance to be selected for a new subset formation. By this way, newer

evolved subsets potentially have higher quality. Generally, genetic search strategy

38

works as follows:

Algorithm 3: Genetic search strategy(Hall, 1999).1: Begins by randomly generating an initial populationP .2: Calculatese(x) for each memberx ∈ P .3: Definesa probability distributionp over the members ofP wherep(x)αe(x).4: Selects two population membersx andy with respect top.5: Applies crossover tox andy to produce new population membersx andy.6: Applies mutation tox andy.7: Insertx andy into P (the next generation).8: If |P | < |P |, go to 4.9: Let P ← P .

10: If there are more generations to process, goto 2.11: Returnx ∈ P for which e(x) is highest.

The selected features by BFS algorithms are shown in Table 5.6.

Table 5.6: GS SelectedFeatures

No. Description Type2 protocoltype symbolic3 service symbolic5 srcbytes continuous6 dstbytes continuous23 count continuous24 srvcount continuous25 serrorrate continuous30 diff srv rate continuous36 dsthost samesrc port rate continuous37 dsthost srv diff host rate continuous

Figure 5.5 shows the block diagram of the proposed methodology:

• The prepared sample datasetwhich was illustrated in Table 5.2 used for building

the three models.

• Most relevant features based on BFS and GA feature selection algorithms were

selected.

• The data with the selected features was used as input to the three types of clas-

sifiers, SVM, DT, ANN. Each classifier was trained and tested in separate ex-

periment.

39

Figure 5.5: Block diagram for proposed methodology.

• The results of theclassifiers were illustrated and analyzed using evaluation cri-

terion specified in the following section 5.6.

5.6 Model Evaluation

In order to check the performance of the developed models, we explored set perfor-

mance evaluation functions such as: Correctly Classified Instances (CCI), Incorrectly

Classified Instances (ICI), Mean Absolute Error (MAE), Root Mean Square Error

(RMSE), and Relative Absolute Error (RAE). These performance evaluation func-

tions are used to measure how accurate the predicted intrusion types by the learned

algorithms to the actual intrusion types. The equations which described are computed

as follows:

CCI =TP + TN

TP + TN + FP + FN(5.1)

ICI =FP + FN

TP + TN + FP + FN(5.2)

whereTP is the proportion of correctly classified instances as positives,TN the pro-

portion of correctly classified instances as negatives,FP proportion of negative in-

stances that were incorrectly classified as positives,FN the proportion of positive

40

instances that were incorrectly classified as negatives. Confusion matrix shownin

Table 5.7 is used to evaluate the performance of the classification system.

Table 5.7: Confusion matrix.

PredectedPositive Negative

Actual Positive TP FNNegative FP TN

MAE =1

n

n∑

i=1

|y − y| (5.3)

RMSE =

√

√

√

√

1

n

n∑

i=1

(y − y)2 (5.4)

RAE =

∑n

i=1|y − y|

∑n

i=1|y − y|

(5.5)

In Equation 5.3, 5.4,and 5.5y is the actual class of connection,y is the predicted

type andy is the mean of the typey usingn instances (Tim, 2015).

5.7 Results

5.7.1 C4.5

The confusion matrix developed based on the C4.5 model is given in Table 5.8. This

matrix is a result of training and testing the model by Weka. The average ratio of

correctly classified instances shown at the last row of the mentioned table. It is the

average of all correctly classified instances of all the attack classes.

There was 992 correctly classified instances from the first type of attack (ipsweep),

which is equal to 99.2%, 0.30% was classified as nmap attack, 0.30% was classified

as normal and 0.20 was classified and satan these are called false negatives. 0.70 %,

0.60%, 0.10%, was classified as (ipsweep) where they are nmap, normal, and satan

respectively, these are called false positives. Using the above confusion matrix and

equation number 5.1; the CCI ratio is calculated as following:

41

Table 5.8: Confusion matrix for the C4.5 model

Pred. ipsweep% neptune% nmap % normal % satan % smurf %Actualipsweep 99.20 0.00 0.30 0.30 0.20 0.00neptune 0.00 99.8 0.00 0.20 0.00 0.00nmap 0.70 0.00 99.0 0.20 0.10 0.00normal 0.60 0.10 0.60 97.40 1.30 0.00satan 0.10 0.10 0.00 0.90 98.9 0.00smurf 0.00 0.00 0.00 0.00 0.00 100.0

Average of correctly classified instances = 99.05 %

CCI =99.20 + 99.8 + 99.0 + 97.40 + 98.9 + 100.0

99.20 + 99.8 + 99.0 + 97.40 + 98.9 + 100.0 + 5.7= 99.05%

The highest accuracy rate was achieved by detecting smurf attack with 100%. It

means it was fully representative by the training data. While the normal records as

detected in the lowest accuracy of 97.40% which means 2.6% of false positive rate.

5.7.2 MLP

The confusion matrix developed based the MLP model is given in Table 5.9.

Table 5.9: Confusion matrix for the MLP model



Here, the highest detection rate was 99.90% of neptune attack, and the lowest is

97.40% of satan. False positive rate is 1.50%.

42

5.7.3 SVM

The confusion matrix developed based the SVM model is given in Table 5.10.

Table 5.10: Confusion matrix for the SVM model



The highest detection accuracy here is 99.30% it is of the normal type, this means

thatSVM achieved the lowest false positive rate of 0.70%.

5.8 Results Analysis

- Performance of each one of the three built models using C4.5, MLP, and SVM were

tested before and after feature selection. The obtained results are shown in Table 5.11

and Figure 5.6.

- C4.5 achieved 99.05% accuracy with all 41 features and building time of 0.47 sec-

ond, and 98.80% with 10 features selected by GA. The accuracy slightly decreased

but the model building time dropped down to 0.06 of the second which is a great deal

of time efficiency level.

- The lowest false positive rate was achieved by SVM, this is because SVM working

way of maximizing the margin between the negative class and the core of the positive

class.

- By applying feature selection process it is clear that the time of all models dropped

down in to less than 50%, it was 10% in C4.5, 30% in SVM, and 40% in the case of

MLP. That means also the reduction of computations size; i.e less computation com-

plexity. This reduction was a result of number of features selected.

- The number of selected features using BFS was 7, and 10 features were selected by

43

GA.

- All features were selectedby BFS, were also selected by GA. They are: proto-

col type, service, srcbytes, dstbytes, count, diffsrv rate, dsthost srv diff host rate.

Since these features have been selected by both algorithms; this implies that these fea-

tures are the most important ones to discover the different types of attack.

- SVM performed better with 10 selected features by GA, means that the classifier

accuracy was negatively affected by the extra irrelevant features.

- Finally, the complexity of the models was reduced in to a respectable amount, the

process of finding the class was a function of (41 features), it dropped down to a func-

tion of (7 features) in the case of BFS method. And this reduced the complexity into

about 1*10E-5.

Table 5.11: Performance evaluation based C4.5, ANN and SVM models

ALGORITHM CCI ICI MAE RMSE RAE Time Taken(s)C4.5 (J48) 99.05% 0.95% 0.0039 0.0534 1.39% 0.47C4.5+BestFirst 97.35% 2.65% 0.0122 0.0903 4.41% 0.06C4.5+Genetic Search 98.80% 1.20% 0.005 0.0573 1.80% 0.11MLP 98.72% 1.28% 0.0061 0.0619 2.18% 485.68MLP+BestFirst 93.05% 6.95% 0.0302 0.1299 10.86% 218.2MLP+Genetic Search 94.77% 5.23% 0.0218 0.1151 7.83% 235.3SVM 89.58% 10.42% 0.0347 0.1863 12.50% 14.66SVM+Best First 93.80% 6.20% 0.0207 0.1438 7.44% 4.08SVM+Genetic Search 86.77% 13.23% 0.0441 0.21 15.88% 5.32

5.9 Summary

In this chapter wegave a detailed analysis of KDDCUP99 and NSL-KDD data sets.

The data features and attack types distribution were shown. The limitations of KD-

DCUP99 were discussed, and the advantages of NSL-KDD explained. The setup of

experiments on the three classification algorithms: C4.5, ANN, and SVM was shown.

Feature selection methods: Best first and genetic search, the evaluation criteria were

explained, and finally, the results were illustrated and analyzed.

44

Original Data BF Search Genetic Search0

10

20

30

40

50

60

70

80

90

100Correctly Classified Instances for C4.5, MLP and SVM

Figure 5.6: Correctly Classified Instances for C4.5, MLP and SVM with the originaldata, selectedfeatures of BF, and selected features of GS.

45

Chapter 6

Conclusions and FutureWork

In this research, we developed three models to solve the intrusion detection problem

using decision tree based C4.5 algorithm, Multi-Layer Perceptron, and Support Vector

Machine. Number of attacks were classified using the three methods. To enhance the

performance of the proposed models and speeding up the detection process, a set of

features were selected using the Best First Search and the Genetic Search methods. A

comparison between the developed models before and after feature selection was pro-

vided. The developed models were capable of reducing the complexity while keeping

acceptable detection accuracy. The decision tree based C4.5 algorithm achieved the

highest classification accuracy compared to other search techniques explored in this

work, while the SVM achieved the lowest false positive rate. As a future work; more

research to be done on how to implement the designed models in real network envi-

ronment. Other data mining techniques could be explored, and working on collecting

new data that could be more useful in the two attack categories: U2R and R2L.

46

Appendix A

Features of NSL-KDD

Table A.1: NSL-KDD Intrusion Detection Data set Features (Kayacik et al., 2005).

Feature name Description Type

1 duration Duration of the connection continuous2 protocol type Connection protocol (e.g. tcp, udp) symbolic3 service Destination service (e.g. telnet, ftp) symbolic4 flag Status flag of the connection symbolic5 src bytes Bytes sent from source to destination continuous6 dst bytes Bytes sent from destination to source continuous7 land 1 if connection is from/to the same host/port; 0 otherwise symbolic8 wrong fragment number of wrong fragments continuous9 urgent number of urgent packets continuous10 hot number of ”hot” indicators continuous11 num failed logins number of failed logins continuous12 loggedin 1 if successfully logged in; 0 otherwise symbolic13 num compromised number of ”compromised” conditions continuous14 root shell 1 if root shell is obtained; 0 otherwise continuous15 su attempted 1 if ”su root” command attempted; 0 otherwise continuous16 num root number of ”root” accesses continuous17 num file creations number of file creation operations continuous18 num shells number of shell prompts continuous19 num accessfiles number of operations on access control files continuous20 num outboundcmds number of outbound commands in an ftp session continuous21 is host login 1 if the login belongs to the ”hot” list; 0 otherwise symbolic22 is guestlogin 1 if the login is a guest login; 0 otherwise symbolic23 count number of connections to the same host as the current connection in the pasttwo seconds continuous24 srv count number of connections to the same host as the current connection in the pasttwo seconds continuous25 serrorrate % of connections that have SYN” errors continuous26 srv serrorrate % of connections that have SYN” errors continuous27 rerror rate % of connections that have REJ” errors continuous28 srv rerror rate % of connections that have REJ” errors continuous29 samesrv rate % of connections to the same service continuous30 diff srv rate % of connections to different services continuous31 srv diff host rate % of connections to different hosts continuous32 dst host count count of connections having the same destination host continuous33 dst host srv count count of connections having the same destination host and using the same service continuous34 dst host samesrv rate count of connections having the same destination host and using the same continuous35 dst hostdiff srv rate % of different services on the current host continuous36 dst host samesrc port rate % of connections to the current host having the same src port continuous37 dst host srv diff host rate % of connections to the current host having the same src port continuous38 dst host serrorrate % of connections to the current host that have an S0 error continuous39 dst host srv serrorrate % of connections to the current host and specified service that have anS0 error continuous40 dst host rerror rate % of connections to the current host that have an RST error continuous41 dst host srv rerror rate % of connections to the current host and specified service that have anRST error continuous42 Label normal/abnormal symbolic

47

Bibliography

Al-Hiary, H., A. Sheta, and A. Ayesh (2008). Identification of a chemical process

reactor using soft computing techniques. InProceedings of the 2008 International

Conference on Fuzzy Systems (FUZZ2008) within the 2008 IEEE World Congress

on Computational Intelligence (WCCI2008), Hong Kong, 1-6 June, pp. 845–653.

Barman, D. K. and G. Khataniar (2012). Design of intrusion detection system based

on artificial neural network and application of rough set.International Journal of

Computer Science and Communication Networks, 548–552.

Bhavsar, Y. B. and K. C. Waghmare (2013). Intrusion detection system using data

mining technique: Support vector machine.International Journal of Emerging

Technology and Advanced Engineering 3(3), 581–586.

Boser, B. E., I. M. Guyon, and V. N. Vapnik (1992). A training algorithm for optimal

margin classifiers. InProceedings of the Fifth Annual Workshop on Computational

Learning Theory, pp. 144–152. ACM.

Brause, R. W. (2001). Medical analysis and diagnosis by neural networks. InProceed-

ings of the Second International Symposium on Medical Data Analysis, ISMDA

’01, London, UK, UK, pp. 1–13. Springer-Verlag.

Breiman, L., J. Friedman, C. Stone J., and R. Olshen (1984).Classification and Re-

gression Trees. The Wadsworth and Brooks-Cole statistics-probability series. Tay-

lor and Francis.

48

Burges, C. (1998). A tutorial on support vector machines for pattern recognition.Data

Mining and Knowledge Discovery 2(2).

Cannady, J. (1998). Artificial neural networks for misuse detection. InNational

Information Systems Security Conference, pp. 443–456.

Chaturvedi, S., R. N. Titre, and N. Sondhiya (2014). Review of handwritten pattern

recognition of digits and special characters using feed forward neural network and

izhikevich neural model. InProceedings of the 2014 International Conference on

Electronic Systems, Signal Processing and Computing Technologies, Washington,

DC, USA, pp. 425–428. IEEE.

Chen, R.-C., K.-F. Cheng, Y.-H. Chen, and C.-F. Hsieh (2009, April). Using rough set

and support vector machine for network intrusion detection system. InIntelligent

Information and Database Systems, 2009. ACIIDS 2009. First Asian Conference

on, pp. 465–470.

Cortes, C. and V. Vapnik (1995, September). Support-vector networks.Machine

Learning 20(3), 273–297.

Cristianini, N. and J. Shawe-Taylor (2000).An Introduction to Support Vector Ma-

chines: And Other Kernel-based Learning Methods. New York, NY, USA: Cam-

bridge University Press.

Das, N. and T. Sarkar (2014, September). Survey on host and network based intru-

sion detection system.Internationl Journal of Advanced Networking and Applica-

tions 6(2), 2266–2269.

Du Jardin, P. (2010, June). Predicting bankruptcy using neural networks and other

classification methods: Theinfluenceof variable selection techniques on model

accuracy.Neurocomput. 73(10-12), 2047–2060.

Farid, D. M., N. Harbi, E. Bahri, M. Z. Rahman, and C. M. Rahman (2010, March).

Attacks classification in adaptive intrusion detection using decision tree. InInter-

national Conference on Computer Science (ICCS’10), Rio De Janeiro, Brazil.

49

Firewalls (2015). Firewall definition from pc magazine encyclopedia. Retrievedfrom

http://www.pcmag.com/encyclopedia/term/43218/firewall; accessed June 18, 2015.

Gadbois, P. (2011, 10). Trainsignal’s comptia security course.https://www.

youtube.com/watch?v=O2Gz-v8WswQ. accessed july 2015.

Giray, S. and A. Polat (2013, Dec). Evaluation and comparison of classification tech-

niques for network intrusion detection. InData Mining Workshops (ICDMW), 2013

IEEE 13th International Conference on, pp. 335–342.

Hall, M. (1999). Correlation-based Feature Selection for Machine Learning. Ph. D.

thesis, University of Waikato.

Hall, M., E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, and I. H. Witten (2009,

November). The weka data mining software: An update.Special Interest Group on

Knowledge Discovery and Data Mining (SIGKDD) 11(1), 10–18.

Han, J., M. Kamber, and J. Pei (2012).Data Mining: Concepts and Techniques(3rd

ed.). San Francisco, CA, USA: Morgan Kaufmann Publishers Inc.

Hu, W., Y. Liao, and R. Vemuri (2003). Robust anomaly detection using support

vector machines. InIn Proceedings of the International Conference on Machine

Learning. Morgan Kaufmann Publishers Inc.

Jessica (2007, April). Intrusions Detection Systems (HIDS vs. NIDS).

Retrieved from http://nforcingsecurity.blogspot.com/2007/04/intrusions-detection-

systems-hids-vs.html; accessed 14-June-2015.

Jha, J. and L. Ragha (2013, June). Intrusion detection system using support vector

machine. IJAIS Proceedings on International Conference and workshop on Ad-

vanced Computing 2013 ICWAC(3), 25–30. Published by Foundation of Computer

Science, New York, USA.

Jiang, J., R. Li, T. Zheng, F. Su, and H. Li (2011). A new intrusion detection system

using class and sample weighted c-support vector machine. InProceedings of the

50

2011 Third International Conference on Communications and Mobile Computing,

CMC ’11, Washington, DC, USA, pp. 51–54. IEEE Computer Society.

Kargupta, H., J. Han, P. S. Yu, R. Motwani, and V. Kumar (2008).Next Generation of

Data Mining(1 ed.). Chapman & Hall/CRC.

Kayacik, H. G., A. N. Zincir-Heywood, and M. I. Heywood (2005). Selecting features

for intrusion detection: A feature relevance analysis on kdd 99 intrusion detection

datasets. InProceedings of the Third Annual Conference on Privacy, Security and

Trust.

Kessel, P. v. and K. Allan (2014, 10). Get ahead of cybercrime.

Khan, L., M. Awad, and B. Thuraisingham (2007). A new intrusion detection sys-

tem using support vector machines and hierarchical clustering.The VLDB Jour-

nal 16(4), 507–521.

Krol, D. and B. Szlachetko (2010, April). Automatic image and speech recognition

based on neural network.Journal of Information Technology Research (JITR) 3(2),

1–17.

Kruegel, C., F. Valeur, and G. Vigna (2005).Intrusion Detection and Correlation:

Challenges and Solutions. Springer Science + Business Media, Inc.

Kumar, K. and R. Punia (2013). Improving the performance of ids using genetic

algorithm. International Journal of Computer Science and Communication 4(2).

Liao, Y. (2005).Machine Learning in Intrusion Detection. Ph. D. thesis, Davis, CA,

USA.

Lichman, M. (2013). UCI machine learning repository.http://archive.ics.

uci.edu/ml. accessed july 2015.

Lokesak, B. (2008). A comparison between signature based and

anomaly based intrusion detection systems. Retrieved from

http://www.iup.edu/WorkArea/DownloadAsset.aspx?id=81109; June 8, 2015.

51

Maimon, O. and L. Rokach (Eds.) (2010).Data Mining and Knowledge Discovery

Handbook, 2nd ed. Springer.

Megha, A. and Amrita (2013). Performance analysis of different feature selection

methods in intrusion detection.International Journal of Scientific and Technology

Research 2(6).

Mohammed, S., S. Marwa, E.-b. Mohammed, and S. Imane (2007). Artificial neural

networks architecture for intrusion detection systems and classification of attacks.

In Faculty of Computers and Information Cairo University.

Moradi, M. and M. Zulkernine (2004). A neural network based system for intrusion

detection and classification of attacks. Retrieved June 14, 2015.

Muhammad-Imran, H., A. Bin-Abdullah, M. Hussain, S. Palaniappan, and I. Ah-

mad (2008). Intrusions detection based on optimum features subset and efficient

dataset selection.International Journal of Engineering and Innovative Technology

(IJEIT) 2(6).

Mukkamala, S., D. Xu, and A. H. Sung (2006). Intrusion detection based on behavior

mining and machine learning techniques. InProceedings of the 19th International

Conference on Advances in Applied Artificial Intelligence: Industrial, Engineering

and Other Applications of Applied Intelligent Systems, IEA/AIE’06, pp. 619–628.

Springer-Verlag.

Mulay, S. A., P. Devale, and G. Garje (2010, 6). Intrusion detection system using

support vector machine and decision tree.International Journal of Computer Ap-

plications 3(3), 40–43. Published By Foundation of Computer Science.

Nadiammai, G. and M. Hemalatha (2014). Effective approach toward intrusion detec-

tion system using data mining techniques.Egyptian Informatics Journal 15(1), 37

– 50.

Ng, A. (2014, Autumn). Cs229 lecture notes.

52

Norgaard, M., O. Ravn, Poulsen, and L. K. Hansen (2000).Neural Networks for

Modellingand Control of Dynamic Systems. Springer, London.

Ooi, S. Y., Y. M. Leong, M. F. Lim, H. K. Tiew, and Y. H. Pang (2013). Network

intrusion data analysis via consistency subset evaluator with ID3, C4.5 and best-

first trees.IJCSNS 13(2), 7.

Ou, G. and Y. L. Murphey (2007, January). Multi-class pattern classification using

neural networks.Pattern Recogn. 40(1), 4–18.

Panda, M., A. Abraham, S. Das, and M. R. Patra (2011, October). Network intrusion

detection system: A machine learning approach.Int. Dec. Tech. 5(4), 347–356.

Papadakis, S. E., P. Tzionas, V. G. Kaburlasos, and J. B. Theocharis (2005). A ge-

netic based approach to the type i structure identification problem.Informatica,

Lithuanian Academy of Sciences 16(3), 365–382.

Pathan, A.-S. K. (2014).The State of the Art in Intrusion Prevention and Detection.

CRC press.

Petra, P. (Ed.) (2012).Machine Learning and Data Mining in Pattern Recognition.

Pfleeger, C. P. and S. L. Pfleeger (2006).Security in Computing (4th Edition). Upper

Saddle River, NJ, USA: Prentice Hall PTR.

Pradhan, A. (2012). Support vector machines - a survey.International Journal of

Emerging Technology and Advanced Engineering 2(8).

Quinlan, J. R. (1993).C4.5: Programs for Machine Learning.

Sabhnani, M. and G. Serpen (2004, September). Why machine learning algorithms

fail in misuse detection on KDD intrusion detection data set.Intell. Data Anal. 8(4),

403–415.

Sahilpreet, S. and B. Meenakshi (2013). Improvement of intrusion detection system

in data mining using neural network.International Journal of Advanced Research

in Computer Science and Software Engineering.

53

Sazzadul Hoque, M., M. Abdul Mukit, and M. Bikas (2012). An implementation

of intrusion detection systemusing genetic algorithm.International Journal of

Network Security & Its Applications 4(2).

Scarfone, K. and P. Mell (2007). Guide to intrusion detection and prevention systems

(idps).

Sen, S. and J. A. Clark (2011, October). Evolutionary computation techniques for

intrusion detection in mobile ad hoc networks.Comput. Netw. 55(15), 3441–3457.

Sharma, S., S. Kumar, and M. Kaur (2014). Recent trend in intrusion detection using

fuzzy-genetic algorithm.International Journal of Advanced Research in Computer

and Communication Engineering 3(5).

Singh, D., M. Dutta, and S. H. Singh (2009). Neural network based handwritten

hindi character recognition system. InProceedings of the 2Nd Bangalore Annual

Compute Conference, New York, NY, USA. ACM.

Sivatha Sindhu, S. S., S. Geetha, and A. Kannan (2012). Decision tree based light

weight intrusion detection using a wrapper approach.Expert Syst. Appl. 39(1),

129–141.

Stallings, W. (2010).Cryptography and Network Security: Principles and Practice

(5th ed.). Upper Saddle River, NJ, USA: Prentice Hall Press.

Sujatha, P. K., C. S. Priya, and A. Kannan (2012). Network intrusion detection system

using genetic network programming with support vector machine. InProceedings

of the International Conference on Advances in Computing, Communications and

Informatics, New York, NY, USA, pp. 645–649. ACM.

Summers, R. C. (2010).Secure computing: Threats and safe-guards. McGraw Hill,

New York.

Tavallaee, M., E. Bagheri, W. Lu, and A. A. Ghorbani (2009). A detailed analy-

sis of the kdd cup 99 data set. InProceedings of the Second IEEE International

54

Conference on Computational Intelligence for Security and Defense Applications,

CISDA’09, Piscataway, NJ, USA, pp. 53–58. IEEE Press.

Tim (2015, 1). How to interpret error measures in weka output?

http://stats.stackexchange.com/questions/131267/

how-to-interpret-error-measures-in-weka-output. accessed

july 2015.

Tsai, C.-F., Y.-F. Hsu, C.-Y. Lin, and W.-Y. Lin (2009, December). Intrusion detection

by machine learning: A review.Expert Systems Applications 36(10), 11994–12000.

Vapnik, V. (1982). Estimation of Dependences Based on Empirical Data: Springer

Series in Statistics (Springer Series in Statistics). Springer-Verlag New York, Inc.

Weiss, G. and F. Provost (2001). The effect of class distribution on classifier learning:

An empirical study. Technical report.

Witten, I. H., E. Frank, and M. A. Hall (2011).Data Mining: Practical Machine

Learning Tools and Techniques(3rd ed.). Morgan Kaufmann Publishers Inc.

Yao, J., S. Zhao, and L. Fan (2006). An enhanced support vector machine model for

intrusion detection. InProceedings of the First International Conference on Rough

Sets and Knowledge Technology, pp. 538–543. Springer-Verlag.

Zhang, G. P. (2000, November). Neural networks for classification: A survey.Trans.

Sys. Man Cyber Part C 30(4), 451–462.

55

Documents

Network Intrusion Classification Using Data Mining Techniqueszu.edu.jo/UploadFile/PaperFiles/PaperFile_56_54.pdfNetwork Intrusion Classification Using Data Mining Techniques By Amneh