16
1 K-Means+ID3 A Novel Method for Supervised Anomaly Detection by Cascading K-Means Clustering and ID3 Decision Tree Learning Methods Author : Shekhar R. Gaddam, Vir V. Phoha, Senior Member, IEEE, and Kiran S. Balagani Reporter : Tze Ho-Lin 2007/7/4 TKDE, 2007

1 K-Means+ID3 A Novel Method for Supervised Anomaly Detection by Cascading K-Means Clustering and ID3 Decision Tree Learning Methods Author : Shekhar R

Embed Size (px)

Citation preview

Page 1: 1 K-Means+ID3 A Novel Method for Supervised Anomaly Detection by Cascading K-Means Clustering and ID3 Decision Tree Learning Methods Author : Shekhar R

1

K-Means+ID3 A Novel Method for Supervised Anomaly Detection by Cascading K-Means Clustering and ID3 Decision Tree Learning Methods

Author : Shekhar R. Gaddam, Vir V. Phoha, Senior Member, IEEE, and Kiran S. Balagani

Reporter : Tze Ho-Lin

2007/7/4

TKDE, 2007

Page 2: 1 K-Means+ID3 A Novel Method for Supervised Anomaly Detection by Cascading K-Means Clustering and ID3 Decision Tree Learning Methods Author : Shekhar R

2

Outline

Motivation Objectives Methodology: K-Means+ID3 Experiments Conclusion Personal Comments

Page 3: 1 K-Means+ID3 A Novel Method for Supervised Anomaly Detection by Cascading K-Means Clustering and ID3 Decision Tree Learning Methods Author : Shekhar R

3

Motivation

The ADS related studies cited above have two drawbacks: 1) these works evaluate the performance of anomaly

detection methods on the measurements drawn from one application domain.

2) the studies build anomaly detection methods with single machine learning techniques like artificial neural-networks, pattern matching, etc.

While recent advances in machine learning show that fusion, selection, and cascading of multiple machine learning methods have a better performance yield over individual methods.

Page 4: 1 K-Means+ID3 A Novel Method for Supervised Anomaly Detection by Cascading K-Means Clustering and ID3 Decision Tree Learning Methods Author : Shekhar R

4

Objectives

We present “K-Means+ID3” , a method to cascade k-Means clustering and the ID3 decision tree bearning methods for classifying anomalous and normal activities in a computer network, an active electronic circuit, and a mechanical mass-beam system.

K-Means clustering ID3 decision tree

Page 5: 1 K-Means+ID3 A Novel Method for Supervised Anomaly Detection by Cascading K-Means Clustering and ID3 Decision Tree Learning Methods Author : Shekhar R

5

Methodology-K-Means+ID3

1. Training1. Partition the training space into k disjoint clusters C1,

C2,…,Ck.

2. ID3 decision tree is trained with the instances in each K-Means cluster.

2. Testing1. Candidate Selection phase

2. Candidate Combination phase

Page 6: 1 K-Means+ID3 A Novel Method for Supervised Anomaly Detection by Cascading K-Means Clustering and ID3 Decision Tree Learning Methods Author : Shekhar R

6

Methodology-Candidate Selection phase

Page 7: 1 K-Means+ID3 A Novel Method for Supervised Anomaly Detection by Cascading K-Means Clustering and ID3 Decision Tree Learning Methods Author : Shekhar R

7

Methodology-Candidate Combination phase1. Harden the anomaly scores of the K-Means met

hod by using the Threshold Rule.

2. Nearest-Consensus Rule

3. Nearest-Neighbor Rule (ID3)

In their experiments, the threshold is set to 0.5

Page 8: 1 K-Means+ID3 A Novel Method for Supervised Anomaly Detection by Cascading K-Means Clustering and ID3 Decision Tree Learning Methods Author : Shekhar R

8

Experiments

Detection accuracy or true positive rate (TPR), False positive rate (FPR) Precision a/(a+c) Total accuracy (or accuracy) (a+d)/(a+b+c+d) F-measure 2a/(2a+b+c) Receiver operating characteristic (ROC) curves an

d areas under ROC curves (AUCs).

Page 9: 1 K-Means+ID3 A Novel Method for Supervised Anomaly Detection by Cascading K-Means Clustering and ID3 Decision Tree Learning Methods Author : Shekhar R

9

Experiments-Data Sets

Network Anomaly Data (NAD) Duffing Equation Data (DED) Mechanical Systems Data (MSD)

Page 10: 1 K-Means+ID3 A Novel Method for Supervised Anomaly Detection by Cascading K-Means Clustering and ID3 Decision Tree Learning Methods Author : Shekhar R

10

Conclusion

The K-Means+ID3 method outperforms the individual k-Means and the ID3 in terms of all the six performance measures over the NAD-1998 data sets.

The K-Means+ID3 method has a very high detection accuracy (99.12%) and AUC performance(0.96) over the NAD-1999 data sets.

The K-Means+ID3 method shows better FPR and precision performance as compared to the k-Means and ID3 over the NAD-2000.

The FPR, Precision, and the F-measure of the K-Means+ID3 is higher than the k-Means method and lower than the ID3 methods over the NAD.

The K-Means+ID3 method has the highest Precision and F-measure values over the MSD.

Page 11: 1 K-Means+ID3 A Novel Method for Supervised Anomaly Detection by Cascading K-Means Clustering and ID3 Decision Tree Learning Methods Author : Shekhar R

11

Personal Comments

Application Anomaly Detection System

Advantage It certainly has better performance than individual

methods.

Disadvantage Parameter selection problem

Page 12: 1 K-Means+ID3 A Novel Method for Supervised Anomaly Detection by Cascading K-Means Clustering and ID3 Decision Tree Learning Methods Author : Shekhar R

12

NAD-1998

Page 13: 1 K-Means+ID3 A Novel Method for Supervised Anomaly Detection by Cascading K-Means Clustering and ID3 Decision Tree Learning Methods Author : Shekhar R

13

NAD-1999

Page 14: 1 K-Means+ID3 A Novel Method for Supervised Anomaly Detection by Cascading K-Means Clustering and ID3 Decision Tree Learning Methods Author : Shekhar R

14

NAD-2000

Page 15: 1 K-Means+ID3 A Novel Method for Supervised Anomaly Detection by Cascading K-Means Clustering and ID3 Decision Tree Learning Methods Author : Shekhar R

15

DED

Page 16: 1 K-Means+ID3 A Novel Method for Supervised Anomaly Detection by Cascading K-Means Clustering and ID3 Decision Tree Learning Methods Author : Shekhar R

16

MSD