Robust Ensemble Classifier Combination Based on Noise Removal with One-Class SVM

IntroductionPreliminaries

Proposed ApproachExperimentsConclusions

Robust Ensemble Classifier Combination Based on NoiseRemoval with One-Class SVM

Ferhat Ozgur [email protected]

TUBITAK - BILGEM - Cyber Security InstituteKocaeli, Turkey

Nov. 10 201522nd International Conference on Neural Information Processing

(ICONIP2015)

Ferhat Ozgur Catak [email protected] TUBITAK - BILGEM - Cyber Security Institute Kocaeli, TurkeyRobust Ensemble Classifier Combination Based on Noise Removal with One-Class SVM



Table of Contents

1 IntroductionNoiseContributions

2 PreliminariesOne-Class SVMAdaBoostData Partitioning Strategies

3 Proposed ApproachBasic IdeaAnalysis of the proposed algorithm

4 ExperimentsExperimental SetupEffect of Noise Removing on Input MatrixSimulation Results

5 ConclusionsConclusions




NoiseContributions

Table of Contents









NoiseContributions

Noise in Machine Learning I

DefinitionNoise: random error or variance in a measured variable

I Noise is ”irrelevant or meaningless data”

I Real-world data are affected by several components; the presence of noiseis a key factor.

I Data in the real world is noisy, mainly due to:I With Big Data technologies, we store every type of data

I Messages, images, tweets etc.

I 80% of the information generated today is of an unstructured variety 1.I Noisy data unnecessarily increases the amount of storage space required

and can also adversely affect the results of any machine learningalgorithm.




NoiseContributions

Noise in Machine Learning II

The performance of theclassifiers will heavilydepend on

I Quality of thetraining data

I Robustness againstnoise of theclassifier itself

Classification

Maximum Accuracy

Data Quality

Corruptions

Learning Algorithm

Figure: The performance of a classifier

1Forbes/Tech, http://www.forbes.com/sites/ciocentral/2013/02/04/big-data-big-hype/Ferhat Ozgur Catak [email protected] TUBITAK - BILGEM - Cyber Security Institute Kocaeli, TurkeyRobust Ensemble Classifier Combination Based on Noise Removal with One-Class SVM



NoiseContributions

Contributions

Our Aim :I Partition the entire data set into sub-datasets to reduce the training timeI The detection and removal of noise that is the result of an imperfect data

collection processI In literature, One-Class SVM is a solution to create Representational

Model of a Data Set.I Improve the classification accuracy using ensemble method (AdaBoost)

I For many classification algorithms, AdaBoost results in dramaticimprovements in performance.




One-Class SVMAdaBoostData Partitioning Strategies

Table of Contents










One-Class SVM II SVM method is used to find

classification models usingthe maximum marginseparating hyper plane.

I Scholkopf et al. proposed atraining methodology thathandles only one classclassification called as”one-class” classification.

Basically,I The method finds soft

boundaries of the dataset

I Then, model determineswhether new instancebelongs to this datasetor not

−1 0 1 2 3 4 5

X1

−1

0

1

2

3

4

5

X2

Origin

-1

+1

Decisi

onBo

unda

ry

One-Class SVM

Decision BoundaryTraining Instances

Figure: One-class SVM. The origin, (0, 0), is the singleinstance with label −1.





One-Class SVM II

I Input instances : S = {(xi , yi )|xi ∈ Rn, y ∈ {1, . . . ,K}}mi=1

I Kernel function : φ : X → H

minw,ξ,ρ

(12 ||w||

2 + 1mC

∑iξi − ρ

)subject to

(w .φ(xi )) ≥ ρ− ξi

ξi ≥ 0,∀i = 1, . . . ,m

(1)





AdaBoost

AdaBoostI AdaBoost (adaptive boosting) is an ensemble learning algorithm that can

be used for classification or regressionI Adaboost improves the accuracy performance of learning algorithmsI Given a space of feature vectors X and two possible class labels,

y ∈ {−1,+1}, AdaBoost goal is to learn a strong classifier H(x) as aweighted ensemble of weak classifiers ht(x) predicting the label of anyinstance x ∈ X .

I AdaBoost is an algorithm for constructing a strong classifier as linearcombination

H(x) =T∑

t=1αtht(x)

of simple weak classifiers ht(x)I ht(x) : weak classifier/hypothesisI H(x) = sign(f (x)) : strong or final classifier/hypothesis





Data Partitioning Strategies

Data PartitioningI In explatory data analysis, partition the large-scale data sets is a general

need.I Computationally infeasible data sets.

Data partitioning reasons:I Diversity : Uncorrelated base classifiers 2 3

I Complexity : reducing the input complexity of large-scale data sets 4

I Specific : build classifier models for the specific part of the input instances 5.

2Kuncheva, Ludmila I. ”Using diversity measures for generating error-correcting output codes in classifier ensembles.” Pattern Recognition Letters26.1 (2005): 83-90.

3Dara, Rozita, Masoud Makrehchi, and Mohamed S. Kamel. ”Filter-based data partitioning for training multiple classifier systems.” Knowledge andData Engineering, IEEE Transactions on 22.4 (2010): 508-522.

4Chawla, Nitesh V., et al. ”Distributed learning with bagging-like performance.” Pattern recognition letters 24.1 (2003): 455-471.5Woods, Kevin, Kevin Bowyer, and W. Philip Kegelmeyer Jr. ”Combination of multiple classifiers using local accuracy estimates.” Computer Vision

and Pattern Recognition, 1996. Proceedings CVPR’96, 1996 IEEE Computer Society Conference on. IEEE, 1996.




Basic IdeaAnalysis of the proposed algorithm

Table of Contents










Basic IdeaMain Objective

I Partition the input data set into sub-data sets, (Xm,Ym)I Pre-processing: Noise removing is applied to each individual sub-data set.

I Create local classifier ensembles for each sub data chunkI Classifier combination: Weighted voting method is used to combine the

each ensemble classifier

Sub data set: X1

Sub data set: X2

Sub data set: Xm

...

X1 → X1,clean

X2 → X2,clean

Xm → Xm,clean

H(X1,clean) → Y

H(X2,clean) → Y

H(Xm,clean) → Y

Ensemble Model Building:

... ...

InputDataSet:

S=

{(xi,y i)|x

i∈R

n,y

∈{1

,...,K

}}m i=

1

EnsembleModelsCom

bination

Noise Removing

Figure: Overall of proposed approach.





Analysis of the proposed algorithm

I Ensemble methods of neural networks gets better accuracy performanceover unseen examples 6

Proposed method :I For each sub-data set, there is a set of classifier functions (ensemble

classifier), H(m)

H(m)(x) = arg maxk

M∑t=1

αtht(x) (2)

I combined into one single classification model, H(x)

H(x) = arg maxk

m∑i=1

βmH(m)(x) (3)

where βm is the accuracy of the ensemble classifier m.6Krogh, Anders, and Jesper Vedelsby. ”Neural network ensembles, cross validation, and

active learning.” Advances in neural information processing systems 7 (1995): 231-238.Ferhat Ozgur Catak [email protected] TUBITAK - BILGEM - Cyber Security Institute Kocaeli, TurkeyRobust Ensemble Classifier Combination Based on Noise Removal with One-Class SVM



Experimental SetupEffect of Noise Removing on Input MatrixSimulation Results

Table of Contents










Experimental Setup

I Our approach is applied to five different data sets to verify its modeleffectivity and efficiency.

Table: Description of the testing data sets used in the experiments.

Data set #Train #Test #Classes #Attributescod-rna 59,535 157,413 2 8ijcnn1 49,990 91,701 2 22letter 15,000 5,000 26 16

shuttle 43,500 14,500 7 9SensIT Vehicle 78,823 19,705 3 100





Effect of Noise Removing on Input Matrix I

GiniGini impurity is a measure of how often a randomly chosen element from the setwould be incorrectly labeled if it were randomly labeled according to thedistribution of labels in the subset.

I Gini Impurity is used to measure the quality of procedure.I Gini approaches deal appropriately with data diversity of a data.I The Gini measures the class distribution of variable y = {y1, · · · , ym}

g = 1−∑

kp2

j (4)

where pj is the probability of class k, in data set D.





Effect of Noise Removing on Input Matrix II

0 0.1 0.2 0.3 0.4 0.5 0.6

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Noise Removing Percantage

Gin

i Im

pu

rity

cod−rna

SensIT Vehicle

ijcnn1

letter

shuttle

(a) Cleaned part.

0 0.1 0.2 0.3 0.4 0.5 0.6

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Noise Removing Percantage

Gin

i Im

pu

rity

cod−rna

SensIT Vehicle

ijcnn1

letter

shuttle

(b) Removed part.

Figure: The impact of one-class SVM on the performance in terms of Gini impurity.

I The Gini impurity value decreases with the cleaning of noisy instances fromthe data, and increases on separated noisy data.





Effect of Noise Removing on Input Matrix III

I Aim is to minimizing the Gini impurity on clean data set Xclean, maximizingthe value on noisy data set Xnoisy .

Split Percantage = arg minp

Gini(Xclean, p)Gini(Xnoisy , p) (5)

Table: The best noise removal percentages of each data sets.

Data Sets Percentage Gini(Xclean) Gini(Xnoisy ) Gini(Xclean)Gini(Xnoisy )

cod-rna 0,55 0,331344656 0,56831775 0,583027112SensIT Vehicle 0,60 0,461155751 0,568847982 0,810683638

ijcnn1 0,60 0,167652825 0,179397223 0,934534117letter 0,60 0,9039108 0,926430562 0,975691905

shuttle 0,30 0,214142833 0,506938229 0,422423918





Simulation Results

Table: Classification performance on example datasets using One-Class SVM noiseremoving and without removing for the proposed learning algorithm.

Extra trees K-nn SVMData set All Clean All Clean All Cleancod-rna 0,75929 0,78652 0,91955 0,93513 0,88553 0,89806ijcnn1 0,69758 0,72175 0,75533 0,77833 0,82989 0,84326letter 0,91255 0,90853 0,90594 0,90318 0,92121 0,9199

shuttle 0,47801 0,44792 0,20262 0,26974 0,62301 0,63939SensIT Vehicle 0,9602 0,90558 0,87904 0,88266 0,99232 0,99174




Conclusions

Table of Contents









Conclusions

Conclusions

Classification PerformanceI Improving classification performance

I Removing the noisy instances using one-class SVMI Find best noise removing ratio with Gini impurity value

ExperimentsI Experimental results show that

I Input complexity of the classification algorithms reduced remarkablyI The accuracy increased by using the noise removal process




Conclusions

Thank YouFerhat Ozgur Catak

[email protected]


Data & Analytics

Robust Ensemble Classifier Combination Based on Noise Removal with One-Class SVM