1 K-Means+ID3 A Novel Method for Supervised Anomaly Detection by Cascading K-Means Clustering and...

K-Means+ID3 A Novel Method for Supervised Anomaly Detection by Cascading K-Means Clustering and ID3 Decision Tree Learning Methods

Author : Shekhar R. Gaddam, Vir V. Phoha, Senior Member, IEEE, and Kiran S. Balagani

Reporter : Tze Ho-Lin

2007/7/4

TKDE, 2007

Outline

Motivation Objectives Methodology: K-Means+ID3 Experiments Conclusion Personal Comments

Motivation

The ADS related studies cited above have two drawbacks: 1) these works evaluate the performance of anomaly

detection methods on the measurements drawn from one application domain.

2) the studies build anomaly detection methods with single machine learning techniques like artificial neural-networks, pattern matching, etc.

While recent advances in machine learning show that fusion, selection, and cascading of multiple machine learning methods have a better performance yield over individual methods.

Objectives

We present “K-Means+ID3” , a method to cascade k-Means clustering and the ID3 decision tree bearning methods for classifying anomalous and normal activities in a computer network, an active electronic circuit, and a mechanical mass-beam system.

K-Means clustering ID3 decision tree

Methodology-K-Means+ID3

1. Training1. Partition the training space into k disjoint clusters C1,

C2,…,Ck.

2. ID3 decision tree is trained with the instances in each K-Means cluster.

2. Testing1. Candidate Selection phase

2. Candidate Combination phase

Methodology-Candidate Selection phase

Methodology-Candidate Combination phase1. Harden the anomaly scores of the K-Means met

hod by using the Threshold Rule.

2. Nearest-Consensus Rule

3. Nearest-Neighbor Rule (ID3)

In their experiments, the threshold is set to 0.5

Experiments

Detection accuracy or true positive rate (TPR), False positive rate (FPR) Precision a/(a+c) Total accuracy (or accuracy) (a+d)/(a+b+c+d) F-measure 2a/(2a+b+c) Receiver operating characteristic (ROC) curves an

d areas under ROC curves (AUCs).

Experiments-Data Sets

Network Anomaly Data (NAD) Duffing Equation Data (DED) Mechanical Systems Data (MSD)

Conclusion

The K-Means+ID3 method outperforms the individual k-Means and the ID3 in terms of all the six performance measures over the NAD-1998 data sets.

The K-Means+ID3 method has a very high detection accuracy (99.12%) and AUC performance(0.96) over the NAD-1999 data sets.

The K-Means+ID3 method shows better FPR and precision performance as compared to the k-Means and ID3 over the NAD-2000.

The FPR, Precision, and the F-measure of the K-Means+ID3 is higher than the k-Means method and lower than the ID3 methods over the NAD.

The K-Means+ID3 method has the highest Precision and F-measure values over the MSD.

Personal Comments

Application Anomaly Detection System

Advantage It certainly has better performance than individual

methods.

Disadvantage Parameter selection problem

NAD-1998

NAD-1999

NAD-2000

1 K-Means+ID3 A Novel Method for Supervised Anomaly Detection by Cascading K-Means Clustering and...

Documents

Synthèse ID3

ID3 Algorithm Michael Crawford. Overview ID3 Background Entropy Shannon Entropy Information Gain ID3 Algorithm ID3 Example Closing Notes

id3 - lara.epfl.chlec14.pdf · Id3 = 0 while (id3 < 10) {println(,id3); id3 = id3 + 1 } source code i d 3 = 0 LF w id3 = 0 while (id3 < 10) lexer characters words (tokens) trees parser

id3 dan c4.5

05. k means clustering ( k-means 클러스터링)

ID3 Presentation

ALGORITMO ID3 Objetivo

K-means Clustering

Analitik Data Tingkat Lanjut (Clustering) · K-means vs Kernel K-Means 3. Studi Kasus 4. Tugas. Konsep Clustering ... Contoh metodenya adalah k-means clustering. K-means Clustering

Scalable K-Means++

Ross ID3 Banner

Cluster K Means

New Implementation of Unsupervised ID3 Algorithm (NIU-ID3 ... · New Implementation of Unsupervised ID3 Algorithm (NIU-ID3) Using Visual Basic.net FARAJ A. EL-MOUADIB1, ZAKARIA S

Clustering: K-Means

Compiler (scalac, gcc) Compiler (scalac, gcc) Id3 = 0 while (id3 < 10) { println(“”,id3); id3 = id3 + 1 } Id3 = 0 while (id3 < 10) { println(“”,id3); id3

Faster Algorithms for the Constrained k-means Problemrjaiswal/Files/list-k-means-slides.pdf · k-means Clustering Problem Problem (k-means) Given n points X ˆRd, and an integer k,

K-means Clustering of Proportional Data Using L1 Distance · IBM Research K-means Clustering of Proportional Data Using L1 Distance Review of K-means clustering K-means clustering

K means cluster

Presentasi K Means

A comparative study of three Decision Tree algorithms: ID3 ... · A comparative study of three Decision Tree algorithms: ID3, Fuzzy ID3 and Probabilistic Fuzzy ID3 Guoxiu Liang 269167