5
An Improved Perceptron Tree Learning Model Based Intrusion Detection Approach Qinzhen Xu, Zhimao Bai, Luxi Yang School of Information Science and Engineering, Southeast University, Nanjing, 210096, China [email protected] Abstract—This paper dedicates to develop an improved perceptron tree (PT) learning model based intrusion detection approach. The binary tree structure of a PT enables the model to divide the intrusion detection problem into sub-problems and solve them in decreased complexity in different tree levels. The expert neural networks (ENNs) embedded in the internal nodes can be simplified by limiting the number of inputs and hidden neurons. The potential advantage of a PT is that the trained learning model is actually a “gray box” since each embedded simplified ENN can be interpreted into explicit rules easily. However, the whole structure of a PT is likely to be high complex, i.e., the trained PT is probably composed of a large number of internal nodes. In this case, the disjunctive description of the learned intrusion detection rules extracted from such PT is too complex to understand. The generalization ability of the detection approach may be depressed as well. In view of this situation, the structure of the trained PT needs to be fine pruned. The experimental results demonstrate that the proposed approach can achieve competitive detection accuracy as well as refined learning model structure. Keywords- perceptron tree; decision tree; intrusion detection; tree pruning I. INTRODUCTION With the rapid expansion of connectivity between computers, security has become a more and more important issue for computer system. Intrusion detection systems (IDSs), which aim to decrease the insecurity of a computer system by dynamically monitoring various features and parameters of the system so as to be able to detect intrusions, are thus becoming increasingly crucial in the past 20 years. Numerous researchers have investigated deeply in the field of intrusion detection and proposed many interesting methodologies. From the perspective of machine learning, an IDS can commonly be one of two types: symbolic learning model based IDS and non-symbolic (or sub-symbolic) learning model based one. The first type of approaches is usually considered understandable because a reasoning procedure can be provided for each decision step or the learning results are a set of rules. Some of the detection techniques introduced for symbolic learning model based IDS include decision tree (DT) based IDSs and RIPPER based IDSs. Lee et al presented a DT based IDS which took the pre-processed raw packet data as the input attributes for the training process of DT[1]. Xiang et al proposed multiple-level tree classifiers based IDSs to decrease the false alarm rate [2][3]. In Helmer et al’s work, feature vectors were generated from the normal and abnormal traces and then RIPPER was trained using these vectors to generate a rule-set for intrusion detection [4]. These approaches can achieve high efficiency in detecting intrusions. However, most of them are not so competent for online learning, i.e., each time some new attack patterns are observed, the so trained IDSs must be revised substantially or a new system must be designed again. The second type of approaches, such as neural network (NN) based IDSs, is good at learning in changing environments because the systems can be revised easily through retraining the free parameters. Some work of using NN techniques to detect intrusions include: five different types of NN based IDSs studied by Beghdad, i.e., multilayer perceptron, generalized feed forward, radial basis function, self-organizing feature map, and principal component analysis NN[5], hierarchical kohonenen net based anomaly detection approach [6], genetic algorithm NN based network intrusion detection method [7], integrated IDS based on multiple NNs composed of principal component NNs, growing neural gas networks and principal component self- organizing map networks[8], and Recurrent NN based detection approach[9]. In these approaches, the trained learning models are stored as weights or other numerical sequences, which can not be understood easily, and hence this type of approaches is regarded as “black-box”. In many situations one needs a learning model that can have the advantages of both symbolic and non-symbolic models. For example, in intrusion detection problem, it is required to revise the learner or refine the knowledge using data observed each day or even each minute. At the same time, it also needs to give the understandable reasons for making decisions to confirm the reliability of the detection results. A direct way to solve this problem is to learn knowledge by a non-symbolic model (e.g., neural network) and to interpret the learned knowledge by a symbolic model. However, the work for interpreting a trained neural network is NP-complete [10]. In such a case, in our previous study we introduced the PT (i.e., neural network tree) based intrusion detection approach [11]. The PT is a DT with ENN embedded in each internal node. Generally speaking, PTs are more powerful than traditional DTs because the ENNs can extract more complex and better features for making decisions. Further, a PT can be interpreted easily if the 2009 International Conference on Artificial Intelligence and Computational Intelligence 978-0-7695-3816-7/09 $26.00 © 2009 IEEE DOI 10.1109/AICI.2009.176 307

[IEEE 2009 International Conference on Artificial Intelligence and Computational Intelligence - Shanghai, China (2009.11.7-2009.11.8)] 2009 International Conference on Artificial Intelligence

  • Upload
    luxi

  • View
    216

  • Download
    0

Embed Size (px)

Citation preview

Page 1: [IEEE 2009 International Conference on Artificial Intelligence and Computational Intelligence - Shanghai, China (2009.11.7-2009.11.8)] 2009 International Conference on Artificial Intelligence

An Improved Perceptron Tree Learning Model Based Intrusion Detection Approach

Qinzhen Xu, Zhimao Bai, Luxi Yang School of Information Science and Engineering,

Southeast University, Nanjing, 210096, China

[email protected]

Abstract—This paper dedicates to develop an improved perceptron tree (PT) learning model based intrusion detection approach. The binary tree structure of a PT enables the model to divide the intrusion detection problem into sub-problems and solve them in decreased complexity in different tree levels. The expert neural networks (ENNs) embedded in the internal nodes can be simplified by limiting the number of inputs and hidden neurons. The potential advantage of a PT is that the trained learning model is actually a “gray box” since each embedded simplified ENN can be interpreted into explicit rules easily. However, the whole structure of a PT is likely to be high complex, i.e., the trained PT is probably composed of a large number of internal nodes. In this case, the disjunctive description of the learned intrusion detection rules extracted from such PT is too complex to understand. The generalization ability of the detection approach may be depressed as well. In view of this situation, the structure of the trained PT needs to be fine pruned. The experimental results demonstrate that the proposed approach can achieve competitive detection accuracy as well as refined learning model structure.

Keywords- perceptron tree; decision tree; intrusion detection; tree pruning

I. INTRODUCTION With the rapid expansion of connectivity between

computers, security has become a more and more important issue for computer system. Intrusion detection systems (IDSs), which aim to decrease the insecurity of a computer system by dynamically monitoring various features and parameters of the system so as to be able to detect intrusions, are thus becoming increasingly crucial in the past 20 years. Numerous researchers have investigated deeply in the field of intrusion detection and proposed many interesting methodologies. From the perspective of machine learning, an IDS can commonly be one of two types: symbolic learning model based IDS and non-symbolic (or sub-symbolic) learning model based one.

The first type of approaches is usually considered understandable because a reasoning procedure can be provided for each decision step or the learning results are a set of rules. Some of the detection techniques introduced for symbolic learning model based IDS include decision tree (DT) based IDSs and RIPPER based IDSs. Lee et al presented a DT based IDS which took the pre-processed raw packet data as the input attributes for the training process of DT[1]. Xiang et al proposed multiple-level tree classifiers

based IDSs to decrease the false alarm rate [2][3]. In Helmer et al’s work, feature vectors were generated from the normal and abnormal traces and then RIPPER was trained using these vectors to generate a rule-set for intrusion detection [4]. These approaches can achieve high efficiency in detecting intrusions. However, most of them are not so competent for online learning, i.e., each time some new attack patterns are observed, the so trained IDSs must be revised substantially or a new system must be designed again.

The second type of approaches, such as neural network (NN) based IDSs, is good at learning in changing environments because the systems can be revised easily through retraining the free parameters. Some work of using NN techniques to detect intrusions include: five different types of NN based IDSs studied by Beghdad, i.e., multilayer perceptron, generalized feed forward, radial basis function, self-organizing feature map, and principal component analysis NN[5], hierarchical kohonenen net based anomaly detection approach [6], genetic algorithm NN based network intrusion detection method [7], integrated IDS based on multiple NNs composed of principal component NNs, growing neural gas networks and principal component self-organizing map networks[8], and Recurrent NN based detection approach[9]. In these approaches, the trained learning models are stored as weights or other numerical sequences, which can not be understood easily, and hence this type of approaches is regarded as “black-box”.

In many situations one needs a learning model that can have the advantages of both symbolic and non-symbolic models. For example, in intrusion detection problem, it is required to revise the learner or refine the knowledge using data observed each day or even each minute. At the same time, it also needs to give the understandable reasons for making decisions to confirm the reliability of the detection results.

A direct way to solve this problem is to learn knowledge by a non-symbolic model (e.g., neural network) and to interpret the learned knowledge by a symbolic model. However, the work for interpreting a trained neural network is NP-complete [10]. In such a case, in our previous study we introduced the PT (i.e., neural network tree) based intrusion detection approach [11]. The PT is a DT with ENN embedded in each internal node. Generally speaking, PTs are more powerful than traditional DTs because the ENNs can extract more complex and better features for making decisions. Further, a PT can be interpreted easily if the

2009 International Conference on Artificial Intelligence and Computational Intelligence

978-0-7695-3816-7/09 $26.00 © 2009 IEEE

DOI 10.1109/AICI.2009.176

307

Page 2: [IEEE 2009 International Conference on Artificial Intelligence and Computational Intelligence - Shanghai, China (2009.11.7-2009.11.8)] 2009 International Conference on Artificial Intelligence

number of inputs and hidden neurons of each ENN is restricted.

However, the whole structure of PT is likely to be high complex, i.e., the trained PT is probably composed of large numbers of simple internal nodes. The disjunctive description of the learned rules extracted from such PT is too complex to understand. The generalization ability of the detection approach may be depressed as well. In that case, this paper aims to propose an improved PT learning model based intrusion detection approach in which the trained PT is fine pruned.

The rest paper is organized as follows. In section II, the improved PT based intrusion detection approach is provided. In section III, the intrusion detection experimental results are presented to verify the proposed approach. Section IV gives the conclusions.

II. THE IMPROVED PT BASED INTRUSION DETECTION APPROACH

Researches have indicated that a PT could take the advantages of both symbolic and non-symbolic models to attach tough problems with a “gray-box” learning process [11][14]. More research work has focused on hybrid tree-structured or multi-level-structured learning approaches with more competitive performance in pattern recognition. These approaches introduced by a number of authors under different names were hybrid decision trees in which NNs or SVMs were either embedded in internal nodes or in leaf nodes[12] [13].

A. The Perceptron Tree In our previous work, the studied PT is a hybrid model

with the overall structure being a DT and each internal node being an ENN, for which the continuous input features are represented by a small number of critical points [11]. Fig. 1 shows an example of the PT structure.

In terms of the interpretability of a trained PT, the numbers of input features and hidden neurons for an ENN should be limited to small ones. Besides, the parameters for optimization also include the weights between input layer and hidden layer and the weights between hidden layer and output layer of each ENN.

The training process of a PT is actually a multi-objective optimization problem. Since no priori knowledge is presented to suggest an appropriate feature subset and a right number of hidden neurons in the training process of each internal node, a doable approach is genetic algorithm [14]. The information gain ratio criterion [15] is applied for tree structure growing.

The entropy of the training set S composed of k classes of patterns Cj, j=1, …, k, is defined as

( )21

( ) / log /k

j jj

info S C S C S=

= − ×∑ , (1)

where |Cj| is the number of examples belonging to some class Cj, and |S| is the number of examples in S.

If the training set is partitioned in accordance with the n outcomes, Ti, i = 1, …, n, of some given test X, the expected

Figure 1. An example of the PT structure

information is defined as

1

( ) / ( )n

X i ii

info S T S info T=

= ×∑ , (2)

where |Ti| is the number of examples belonging to the ith outcome Ti.

The potential information is defined as

21

_ ( ) / log /n

X i ii

split info S T S T S=

= − ×∑ , (3)

The information gain ratio is accordingly defined as

gain_ratioX(S) = [info(S) – infoX(S)] /split_infoX(S). (4)

The fitness function is defined as

F(Enn) = ( f1(Enn), f2(Enn) ), (5)

where Enn is the optimized expert neural network, f1(Enn) is the information gain ratio, and f2(Enn) is the number of hidden nodes.

An Enn will be chosen as the best individual if it reaches the highest information gain ratio but lowest number of hidden neurons. F(Enn) tends to approach the optimal Pareto solution. f1(Enn) takes the priority of Pareto ranking, i.e., the performance of Enni will be considered better than that of Ennj, if both of the following two inequalities,

f1(Enni) > f1(Ennj) (6)

and

f2(Enni) ≥ f2(Ennj), (7)

are satisfied. The feature index, number of hidden neurons, and

weights are encoded as a chromosome. The optimization process of a node of a PT is listed as follows.

Step 1. Generate a random population of n initial chromosomes.

Step 2. Recover the chromosomes into ENNs. Step 3. Compute the information gain ratio of each ENN. Step 4. Evaluate the fitness of ENNs according to Pareto

ranking of information gain ratio and hidden neuron numbers.

308

Page 3: [IEEE 2009 International Conference on Artificial Intelligence and Computational Intelligence - Shanghai, China (2009.11.7-2009.11.8)] 2009 International Conference on Artificial Intelligence

Step 5. Keep the best individual and then stop the evolutionary computation process if the algorithm reaches the given threshold of fitness or the maximal iteration times; else go step 7.

Step 6. Decode the best individual from the chromosome, i.e., recover the trained ENN.

Step 7. Cross and mutate the better individuals on a given selection rate and mutation rate. Go step 2.

The growth of a PT takes a similar recursive process as that of a DT.

B. The Pruned Perceptron Tree The above trained ENN may be simplified by limiting the

number of inputs and hidden nodes. However, the whole structure of PT is still likely to be high complex, i.e., the trained PT is probably composed of large numbers of internal nodes. The disjunctive description of the learned rules extracted from such PT is too complex to understand. Further, the learned model of high complexity may easily lead to overfitting training data, i.e. the generalization ability of the detection approach may be depressed as well. This section is dedicated to propose an improved PT learning model based intrusion detection approach in which the trained PT is fine pruned.

Two different approaches are presented to prune DTs to avoid overfitting[16]. One direct strategy is to pre-pruning DT, i.e., to stop growing of a DT when data split not statistically significant or too few examples are in a split. However, it is always hard to evaluate the right time to stop splitting. The other strategy is to post-prune a grown DT. Two of the most typical post-pruning methods are rule-post-pruning and reduced-error-pruning [15].

In the first pruning strategy, a tree is converted to an equivalent set of rules. Then prune the rules independently without depress the recognition precise. It is probably the most frequently used method for DT pruning. However, it is unpractical for PT as it requires high cost of computation to interpret a PT with high complexity into rules.

In reduced-error-pruning method, decision tree is usually simplified by discarding one or more sub-trees and replacing them with leaves, i.e., the tree is built over training data set, while each sub-tree is evaluated on the separated validation set. The sub-trees that can’t reduce the accuracy on validation set are greedily removed.

Given a trained PT described in section 2.1, the reduced-error-pruning process is listed as follows.

Input: trained PT, validation set Step1. Evaluate the PT on the validation set. Record the

initial accuracy of the PT as Accvld. Step2. Remove the sub-PT rooted at the current internal

node, i.e., make the current node a leaf node assigned the most common classification of the training examples affiliated with this node.

Step3. Evaluate the pruned PT on validation set. Record the accuracy of pruned PT as Prn_ Accvld.

Step4. If Accvld < Prn_ Accvld, remove the sub-PT, keep the current node as a labeled leaf. Else, go step5.

Step5. Keep the current sub-PT un-pruned. If the left sub-node of current node is a leaf node, terminate the pruning

process. Else, take the left sub-node of current node as current node. Do setp1 to step4.

Step6. If the right sub-node of current node is a leaf node, terminate the pruning process. Else, take the right sub-node of current node as current node. Do setp1 to step4.

In this way, the trained PT is fine pruned. The residual sub-trees are eliminated. The generalization ability of the detection approach may be improved as well.

The PT trained over high dimensional KDD cup 99 intrusion datasets is more likely to be high complex; thus the generalization ability of the detection approach may be depressed. For this purpose, an improved PT learning model based intrusion detection approach is presented to achieve better detection efficiency in this paper.

III. EXPERIMENTAL RESULTS This section tests the performance of the proposed

approach on the corrected dataset of the well known KDD 99 Intrusion Detection Datasets. 5010 instances are randomly sampled from the corrected intrusion dataset. 1/4 instances of them are for train. The separated 1/2 ones are for validation. The left ones are for test. The input number of features is limited to 3. The population size and generation for genetic algorithm are set to 100 and 500, respectively. The average numerical results of 30 runs are reported as the experimental results.

Table I lists the compared results of the pruned PT based intrusion detection approach and the other detection approaches, where Nnode, Acctst and AccFA are the number of internal nodes, the detection accuracy of test data and the false alarm rate for the trained PT, respectively.

Table I suggests that the pruned PT based intrusion detection approach can achieve higher recognition accuracy and competitive low false alarm rate with a fewer number of internal nodes compared with that of unpruned PT, i.e., the structure of an improved PT is deeply simplified without depressing the high test detection accuracy.

The detection result of the proposed approach is also compared to that of SVM based approach and RBF neural network based approach [5] to verify the performance of the presented approach. The numerical results also show that the PT based approach can obtain more competitive detection accuracy and lower false alarm rate.

A trained PT can mostly cover the information of those features which are critical important to intrusion detection. Fig. 2 and fig. 3 are the normalized selection frequency of each feature on unpruned and pruned PT, respectively. The frequency of a feature being selected at ENNs partly indicates its importance of impacting the detection performance of the learned model. The difference distribution of selection frequency between fig. 2 and fig. 3 is caused by the pruning process. The features selected for the residual sub-trees are not counted when these sub-trees are pruned. By this means, fig. 3 may approach the distribution of the significance of features in intrusion detection PT more.

IV. CONCLUSION The complexity of an ENN of a PT can be controlled by

309

Page 4: [IEEE 2009 International Conference on Artificial Intelligence and Computational Intelligence - Shanghai, China (2009.11.7-2009.11.8)] 2009 International Conference on Artificial Intelligence

TABLE I. THE COMPARED ACCURACY OF INTRUSION DETECTION APPROACHES.

Approach Nnode Acctst (%) AccFA (%)PT 24.87 96.71 2.71

Pruned PT 17.12 96.77 2.73SVM -- 95.54 3.83RBF -- 90.11 9.88

Figure 2. The normalized selection frequency of the features in the

unpruned PT

Figure 3. The normalized selection frequency of the features in the

pruned PT

limiting the number of inputs and hidden nodes. However, the whole structure of a PT is still likely to be high complex with a large number of internal nodes. The disjunctive description of the learned intrusion detection rules extracted from such PT may be too complex to understand. In this case, the generalization ability of the detection approach may be depressed as well.

This paper dedicates to present a pruned PT based intrusion detection approach, in which a trained PT is pruned according to reduced-error-pruning method. The experimental results suggested that the proposed approach is effective in constructing a tidy learning model with competitive detection accuracy. Further, the simplified PT

structure will be more comprehensible than the unpruned one when they are extracted into explicit detection rules.

ACKNOWLEDGEMENTS This work was supported by the Project of National

Natural Science Foundation of China (60702029) and the Project of Bringing in Talents Foundation of Southeast University (4004001041).

REFERENCES [1] Joong-Hee Lee, Jong-Hyouk Lee, Seon-Gyoung Sohn, Jong-Ho Ryu, Tai-Myoung Chung, “Effective Value of Decision Tree with KDD 99 Intrusion Detection Datasets for Intrusion Detection System,” 10th International Conference on Advanced Communication Technology, vol.2,pp. 1170-1175, 2008. [2] Xiang, C., Chong, M.Y., Zhu, H.L., “Design of multiple-level tree classifiers for intrusion detection system,” IEEE Conference on Cybernetics and Intelligent Systems, Singapore, pp. 872–877, December, 2004. [3] Cheng Xiang, Png Chin Yong, Lim Swee Meng, “Design of multiple-level hybrid classifier for intrusion detection system using Bayesian clustering and decision trees,” Pattern Recognition Letters, vol. 29, no. 7, pp. 918-924, May 2008 [4] Guy Helmer, Johnny Wong, and Subhasri Madaka, “Anomalous intrusion detection system for hostile Java applets,” Journal of Systems and Software, vol. 55, no. 3, pp. 273-286, 2001. [5] Rachid Beghdad, “Critical study of neural networks in detecting intrusions,” Computers & Security, vol. 27, no. 5-6, Pages 168-175, October 2008 [6] Suseela T. Sarasamma, Qiuming A. Zhu, and Julie Huff, “Hierarchical Kohonenen Net for Anomaly Detection in Network Security,” IEEE Transactions on Systems, Man, and Cybernetics—Part B: Cybernetics, vol. 35, no. 2, pp.302-312, 2005. [7] Jingwen Tian, Meijuan Gao, “Network intrusion detection method based on high speed and precise genetic algorithm neural network,” International Conference on Networks Security, Wireless Communications and Trusted Computing, vol. 2, 25-26 April 2009, pp. 619 – 622. [8] Guisong Liu, Xiaobin Wang, “An integrated intrusion detection system by using multiple neural networks,” IEEE Conference on Cybernetics and Intelligent Systems, 21-24 Sept. 2008, pp. 22 – 27. [9] Chowdhury, N., Kashem, M.A., “A comparative analysis of Feed-forward neural network & Recurrent Neural network to detect intrusion,” International Conference on Electrical and Computer Engineering(ICECE 08), 20-22 Dec. 2008, pp. 488 – 492. [10] Hiroshi Tsukimoto, “Extracting Rules from trained neural networks,” IEEE Trans. on Neural Networks, vol. 11,no. 2, pp. 377–389, 2000. [11] Qinzhen Xu, Wenjiang Pei, Luxi Yang, Qiangfu Zhao, “An Intrusion Detection Approach Based on Understandable Neural Network trees,” International Journal of Computer Science and Network Security, vol. 6, No. 11, pp. 229-234, 2006. [12] Qinzhen Xu, Pinzheng Zhang, Wenjiang Pei, Luxi Yang, Zhenya He, “An Automatic Facial Expression Recognition Approach Based on Confusion-crossed Support Vector Machine Tree,” IEEE International Coference on Acoustics, Speech, and Signal Processing , Vol I, pp.625-628, 2007.

310

Page 5: [IEEE 2009 International Conference on Artificial Intelligence and Computational Intelligence - Shanghai, China (2009.11.7-2009.11.8)] 2009 International Conference on Artificial Intelligence

[13] Zhou Z. H., Chen Z.Q., “Hybrid decision tree,” Knowledge-Based Systems, vol. 15, pp. 515-518, 2002. [14] Zhao Q.F., “Evolutionary design of neural network tree -- integration of decision tree, neural network and GA,” In: Proc. IEEE Congress on Evolutionary Computation, pp. 240-244, Seoul, 2001. [15] Quilan J R., “C4.5: Programs for Machine Learning,” San Mateo, Morgan Kaufmann Publishers, pp.17-25, 1993. [16] T. Mitchell, "Decision Tree Learning,” in T. Mitchell, Machine Learning, The McGraw-Hill Companies, Inc., pp. 52-78, 1997.

311