Upload
ferhat-ozgur-catak-phd
View
334
Download
0
Embed Size (px)
Citation preview
IntroductionPreliminaries
Proposed ApproachExperimentsConclusions
Robust Ensemble Classifier Combination Based on NoiseRemoval with One-Class SVM
Ferhat Ozgur [email protected]
TUBITAK - BILGEM - Cyber Security InstituteKocaeli, Turkey
Nov. 10 201522nd International Conference on Neural Information Processing
(ICONIP2015)
Ferhat Ozgur Catak [email protected] TUBITAK - BILGEM - Cyber Security Institute Kocaeli, TurkeyRobust Ensemble Classifier Combination Based on Noise Removal with One-Class SVM
IntroductionPreliminaries
Proposed ApproachExperimentsConclusions
Table of Contents
1 IntroductionNoiseContributions
2 PreliminariesOne-Class SVMAdaBoostData Partitioning Strategies
3 Proposed ApproachBasic IdeaAnalysis of the proposed algorithm
4 ExperimentsExperimental SetupEffect of Noise Removing on Input MatrixSimulation Results
5 ConclusionsConclusions
Ferhat Ozgur Catak [email protected] TUBITAK - BILGEM - Cyber Security Institute Kocaeli, TurkeyRobust Ensemble Classifier Combination Based on Noise Removal with One-Class SVM
IntroductionPreliminaries
Proposed ApproachExperimentsConclusions
NoiseContributions
Table of Contents
1 IntroductionNoiseContributions
2 PreliminariesOne-Class SVMAdaBoostData Partitioning Strategies
3 Proposed ApproachBasic IdeaAnalysis of the proposed algorithm
4 ExperimentsExperimental SetupEffect of Noise Removing on Input MatrixSimulation Results
5 ConclusionsConclusions
Ferhat Ozgur Catak [email protected] TUBITAK - BILGEM - Cyber Security Institute Kocaeli, TurkeyRobust Ensemble Classifier Combination Based on Noise Removal with One-Class SVM
IntroductionPreliminaries
Proposed ApproachExperimentsConclusions
NoiseContributions
Noise in Machine Learning I
DefinitionNoise: random error or variance in a measured variable
I Noise is ”irrelevant or meaningless data”
I Real-world data are affected by several components; the presence of noiseis a key factor.
I Data in the real world is noisy, mainly due to:I With Big Data technologies, we store every type of data
I Messages, images, tweets etc.
I 80% of the information generated today is of an unstructured variety 1.I Noisy data unnecessarily increases the amount of storage space required
and can also adversely affect the results of any machine learningalgorithm.
Ferhat Ozgur Catak [email protected] TUBITAK - BILGEM - Cyber Security Institute Kocaeli, TurkeyRobust Ensemble Classifier Combination Based on Noise Removal with One-Class SVM
IntroductionPreliminaries
Proposed ApproachExperimentsConclusions
NoiseContributions
Noise in Machine Learning II
The performance of theclassifiers will heavilydepend on
I Quality of thetraining data
I Robustness againstnoise of theclassifier itself
Classification
Maximum Accuracy
Data Quality
Corruptions
Learning Algorithm
Figure: The performance of a classifier
1Forbes/Tech, http://www.forbes.com/sites/ciocentral/2013/02/04/big-data-big-hype/Ferhat Ozgur Catak [email protected] TUBITAK - BILGEM - Cyber Security Institute Kocaeli, TurkeyRobust Ensemble Classifier Combination Based on Noise Removal with One-Class SVM
IntroductionPreliminaries
Proposed ApproachExperimentsConclusions
NoiseContributions
Contributions
Our Aim :I Partition the entire data set into sub-datasets to reduce the training timeI The detection and removal of noise that is the result of an imperfect data
collection processI In literature, One-Class SVM is a solution to create Representational
Model of a Data Set.I Improve the classification accuracy using ensemble method (AdaBoost)
I For many classification algorithms, AdaBoost results in dramaticimprovements in performance.
Ferhat Ozgur Catak [email protected] TUBITAK - BILGEM - Cyber Security Institute Kocaeli, TurkeyRobust Ensemble Classifier Combination Based on Noise Removal with One-Class SVM
IntroductionPreliminaries
Proposed ApproachExperimentsConclusions
One-Class SVMAdaBoostData Partitioning Strategies
Table of Contents
1 IntroductionNoiseContributions
2 PreliminariesOne-Class SVMAdaBoostData Partitioning Strategies
3 Proposed ApproachBasic IdeaAnalysis of the proposed algorithm
4 ExperimentsExperimental SetupEffect of Noise Removing on Input MatrixSimulation Results
5 ConclusionsConclusions
Ferhat Ozgur Catak [email protected] TUBITAK - BILGEM - Cyber Security Institute Kocaeli, TurkeyRobust Ensemble Classifier Combination Based on Noise Removal with One-Class SVM
IntroductionPreliminaries
Proposed ApproachExperimentsConclusions
One-Class SVMAdaBoostData Partitioning Strategies
One-Class SVM II SVM method is used to find
classification models usingthe maximum marginseparating hyper plane.
I Scholkopf et al. proposed atraining methodology thathandles only one classclassification called as”one-class” classification.
Basically,I The method finds soft
boundaries of the dataset
I Then, model determineswhether new instancebelongs to this datasetor not
−1 0 1 2 3 4 5
X1
−1
0
1
2
3
4
5
X2
Origin
-1
+1
Decisi
onBo
unda
ry
One-Class SVM
Decision BoundaryTraining Instances
Figure: One-class SVM. The origin, (0, 0), is the singleinstance with label −1.
Ferhat Ozgur Catak [email protected] TUBITAK - BILGEM - Cyber Security Institute Kocaeli, TurkeyRobust Ensemble Classifier Combination Based on Noise Removal with One-Class SVM
IntroductionPreliminaries
Proposed ApproachExperimentsConclusions
One-Class SVMAdaBoostData Partitioning Strategies
One-Class SVM II
I Input instances : S = {(xi , yi )|xi ∈ Rn, y ∈ {1, . . . ,K}}mi=1
I Kernel function : φ : X → H
minw,ξ,ρ
(12 ||w||
2 + 1mC
∑iξi − ρ
)subject to
(w .φ(xi )) ≥ ρ− ξi
ξi ≥ 0,∀i = 1, . . . ,m
(1)
Ferhat Ozgur Catak [email protected] TUBITAK - BILGEM - Cyber Security Institute Kocaeli, TurkeyRobust Ensemble Classifier Combination Based on Noise Removal with One-Class SVM
IntroductionPreliminaries
Proposed ApproachExperimentsConclusions
One-Class SVMAdaBoostData Partitioning Strategies
AdaBoost
AdaBoostI AdaBoost (adaptive boosting) is an ensemble learning algorithm that can
be used for classification or regressionI Adaboost improves the accuracy performance of learning algorithmsI Given a space of feature vectors X and two possible class labels,
y ∈ {−1,+1}, AdaBoost goal is to learn a strong classifier H(x) as aweighted ensemble of weak classifiers ht(x) predicting the label of anyinstance x ∈ X .
I AdaBoost is an algorithm for constructing a strong classifier as linearcombination
H(x) =T∑
t=1αtht(x)
of simple weak classifiers ht(x)I ht(x) : weak classifier/hypothesisI H(x) = sign(f (x)) : strong or final classifier/hypothesis
Ferhat Ozgur Catak [email protected] TUBITAK - BILGEM - Cyber Security Institute Kocaeli, TurkeyRobust Ensemble Classifier Combination Based on Noise Removal with One-Class SVM
IntroductionPreliminaries
Proposed ApproachExperimentsConclusions
One-Class SVMAdaBoostData Partitioning Strategies
Data Partitioning Strategies
Data PartitioningI In explatory data analysis, partition the large-scale data sets is a general
need.I Computationally infeasible data sets.
Data partitioning reasons:I Diversity : Uncorrelated base classifiers 2 3
I Complexity : reducing the input complexity of large-scale data sets 4
I Specific : build classifier models for the specific part of the input instances 5.
2Kuncheva, Ludmila I. ”Using diversity measures for generating error-correcting output codes in classifier ensembles.” Pattern Recognition Letters26.1 (2005): 83-90.
3Dara, Rozita, Masoud Makrehchi, and Mohamed S. Kamel. ”Filter-based data partitioning for training multiple classifier systems.” Knowledge andData Engineering, IEEE Transactions on 22.4 (2010): 508-522.
4Chawla, Nitesh V., et al. ”Distributed learning with bagging-like performance.” Pattern recognition letters 24.1 (2003): 455-471.5Woods, Kevin, Kevin Bowyer, and W. Philip Kegelmeyer Jr. ”Combination of multiple classifiers using local accuracy estimates.” Computer Vision
and Pattern Recognition, 1996. Proceedings CVPR’96, 1996 IEEE Computer Society Conference on. IEEE, 1996.
Ferhat Ozgur Catak [email protected] TUBITAK - BILGEM - Cyber Security Institute Kocaeli, TurkeyRobust Ensemble Classifier Combination Based on Noise Removal with One-Class SVM
IntroductionPreliminaries
Proposed ApproachExperimentsConclusions
Basic IdeaAnalysis of the proposed algorithm
Table of Contents
1 IntroductionNoiseContributions
2 PreliminariesOne-Class SVMAdaBoostData Partitioning Strategies
3 Proposed ApproachBasic IdeaAnalysis of the proposed algorithm
4 ExperimentsExperimental SetupEffect of Noise Removing on Input MatrixSimulation Results
5 ConclusionsConclusions
Ferhat Ozgur Catak [email protected] TUBITAK - BILGEM - Cyber Security Institute Kocaeli, TurkeyRobust Ensemble Classifier Combination Based on Noise Removal with One-Class SVM
IntroductionPreliminaries
Proposed ApproachExperimentsConclusions
Basic IdeaAnalysis of the proposed algorithm
Basic IdeaMain Objective
I Partition the input data set into sub-data sets, (Xm,Ym)I Pre-processing: Noise removing is applied to each individual sub-data set.
I Create local classifier ensembles for each sub data chunkI Classifier combination: Weighted voting method is used to combine the
each ensemble classifier
Sub data set: X1
Sub data set: X2
Sub data set: Xm
...
X1 → X1,clean
X2 → X2,clean
Xm → Xm,clean
H(X1,clean) → Y
H(X2,clean) → Y
H(Xm,clean) → Y
Ensemble Model Building:
... ...
InputDataSet:
S=
{(xi,y i)|x
i∈R
n,y
∈{1
,...,K
}}m i=
1
EnsembleModelsCom
bination
Noise Removing
Figure: Overall of proposed approach.
Ferhat Ozgur Catak [email protected] TUBITAK - BILGEM - Cyber Security Institute Kocaeli, TurkeyRobust Ensemble Classifier Combination Based on Noise Removal with One-Class SVM
IntroductionPreliminaries
Proposed ApproachExperimentsConclusions
Basic IdeaAnalysis of the proposed algorithm
Analysis of the proposed algorithm
I Ensemble methods of neural networks gets better accuracy performanceover unseen examples 6
Proposed method :I For each sub-data set, there is a set of classifier functions (ensemble
classifier), H(m)
H(m)(x) = arg maxk
M∑t=1
αtht(x) (2)
I combined into one single classification model, H(x)
H(x) = arg maxk
m∑i=1
βmH(m)(x) (3)
where βm is the accuracy of the ensemble classifier m.6Krogh, Anders, and Jesper Vedelsby. ”Neural network ensembles, cross validation, and
active learning.” Advances in neural information processing systems 7 (1995): 231-238.Ferhat Ozgur Catak [email protected] TUBITAK - BILGEM - Cyber Security Institute Kocaeli, TurkeyRobust Ensemble Classifier Combination Based on Noise Removal with One-Class SVM
IntroductionPreliminaries
Proposed ApproachExperimentsConclusions
Experimental SetupEffect of Noise Removing on Input MatrixSimulation Results
Table of Contents
1 IntroductionNoiseContributions
2 PreliminariesOne-Class SVMAdaBoostData Partitioning Strategies
3 Proposed ApproachBasic IdeaAnalysis of the proposed algorithm
4 ExperimentsExperimental SetupEffect of Noise Removing on Input MatrixSimulation Results
5 ConclusionsConclusions
Ferhat Ozgur Catak [email protected] TUBITAK - BILGEM - Cyber Security Institute Kocaeli, TurkeyRobust Ensemble Classifier Combination Based on Noise Removal with One-Class SVM
IntroductionPreliminaries
Proposed ApproachExperimentsConclusions
Experimental SetupEffect of Noise Removing on Input MatrixSimulation Results
Experimental Setup
I Our approach is applied to five different data sets to verify its modeleffectivity and efficiency.
Table: Description of the testing data sets used in the experiments.
Data set #Train #Test #Classes #Attributescod-rna 59,535 157,413 2 8ijcnn1 49,990 91,701 2 22letter 15,000 5,000 26 16
shuttle 43,500 14,500 7 9SensIT Vehicle 78,823 19,705 3 100
Ferhat Ozgur Catak [email protected] TUBITAK - BILGEM - Cyber Security Institute Kocaeli, TurkeyRobust Ensemble Classifier Combination Based on Noise Removal with One-Class SVM
IntroductionPreliminaries
Proposed ApproachExperimentsConclusions
Experimental SetupEffect of Noise Removing on Input MatrixSimulation Results
Effect of Noise Removing on Input Matrix I
GiniGini impurity is a measure of how often a randomly chosen element from the setwould be incorrectly labeled if it were randomly labeled according to thedistribution of labels in the subset.
I Gini Impurity is used to measure the quality of procedure.I Gini approaches deal appropriately with data diversity of a data.I The Gini measures the class distribution of variable y = {y1, · · · , ym}
g = 1−∑
kp2
j (4)
where pj is the probability of class k, in data set D.
Ferhat Ozgur Catak [email protected] TUBITAK - BILGEM - Cyber Security Institute Kocaeli, TurkeyRobust Ensemble Classifier Combination Based on Noise Removal with One-Class SVM
IntroductionPreliminaries
Proposed ApproachExperimentsConclusions
Experimental SetupEffect of Noise Removing on Input MatrixSimulation Results
Effect of Noise Removing on Input Matrix II
0 0.1 0.2 0.3 0.4 0.5 0.6
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Noise Removing Percantage
Gin
i Im
pu
rity
cod−rna
SensIT Vehicle
ijcnn1
letter
shuttle
(a) Cleaned part.
0 0.1 0.2 0.3 0.4 0.5 0.6
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Noise Removing Percantage
Gin
i Im
pu
rity
cod−rna
SensIT Vehicle
ijcnn1
letter
shuttle
(b) Removed part.
Figure: The impact of one-class SVM on the performance in terms of Gini impurity.
I The Gini impurity value decreases with the cleaning of noisy instances fromthe data, and increases on separated noisy data.
Ferhat Ozgur Catak [email protected] TUBITAK - BILGEM - Cyber Security Institute Kocaeli, TurkeyRobust Ensemble Classifier Combination Based on Noise Removal with One-Class SVM
IntroductionPreliminaries
Proposed ApproachExperimentsConclusions
Experimental SetupEffect of Noise Removing on Input MatrixSimulation Results
Effect of Noise Removing on Input Matrix III
I Aim is to minimizing the Gini impurity on clean data set Xclean, maximizingthe value on noisy data set Xnoisy .
Split Percantage = arg minp
Gini(Xclean, p)Gini(Xnoisy , p) (5)
Table: The best noise removal percentages of each data sets.
Data Sets Percentage Gini(Xclean) Gini(Xnoisy ) Gini(Xclean)Gini(Xnoisy )
cod-rna 0,55 0,331344656 0,56831775 0,583027112SensIT Vehicle 0,60 0,461155751 0,568847982 0,810683638
ijcnn1 0,60 0,167652825 0,179397223 0,934534117letter 0,60 0,9039108 0,926430562 0,975691905
shuttle 0,30 0,214142833 0,506938229 0,422423918
Ferhat Ozgur Catak [email protected] TUBITAK - BILGEM - Cyber Security Institute Kocaeli, TurkeyRobust Ensemble Classifier Combination Based on Noise Removal with One-Class SVM
IntroductionPreliminaries
Proposed ApproachExperimentsConclusions
Experimental SetupEffect of Noise Removing on Input MatrixSimulation Results
Simulation Results
Table: Classification performance on example datasets using One-Class SVM noiseremoving and without removing for the proposed learning algorithm.
Extra trees K-nn SVMData set All Clean All Clean All Cleancod-rna 0,75929 0,78652 0,91955 0,93513 0,88553 0,89806ijcnn1 0,69758 0,72175 0,75533 0,77833 0,82989 0,84326letter 0,91255 0,90853 0,90594 0,90318 0,92121 0,9199
shuttle 0,47801 0,44792 0,20262 0,26974 0,62301 0,63939SensIT Vehicle 0,9602 0,90558 0,87904 0,88266 0,99232 0,99174
Ferhat Ozgur Catak [email protected] TUBITAK - BILGEM - Cyber Security Institute Kocaeli, TurkeyRobust Ensemble Classifier Combination Based on Noise Removal with One-Class SVM
IntroductionPreliminaries
Proposed ApproachExperimentsConclusions
Conclusions
Table of Contents
1 IntroductionNoiseContributions
2 PreliminariesOne-Class SVMAdaBoostData Partitioning Strategies
3 Proposed ApproachBasic IdeaAnalysis of the proposed algorithm
4 ExperimentsExperimental SetupEffect of Noise Removing on Input MatrixSimulation Results
5 ConclusionsConclusions
Ferhat Ozgur Catak [email protected] TUBITAK - BILGEM - Cyber Security Institute Kocaeli, TurkeyRobust Ensemble Classifier Combination Based on Noise Removal with One-Class SVM
IntroductionPreliminaries
Proposed ApproachExperimentsConclusions
Conclusions
Conclusions
Classification PerformanceI Improving classification performance
I Removing the noisy instances using one-class SVMI Find best noise removing ratio with Gini impurity value
ExperimentsI Experimental results show that
I Input complexity of the classification algorithms reduced remarkablyI The accuracy increased by using the noise removal process
Ferhat Ozgur Catak [email protected] TUBITAK - BILGEM - Cyber Security Institute Kocaeli, TurkeyRobust Ensemble Classifier Combination Based on Noise Removal with One-Class SVM
IntroductionPreliminaries
Proposed ApproachExperimentsConclusions
Conclusions
Thank YouFerhat Ozgur Catak
Ferhat Ozgur Catak [email protected] TUBITAK - BILGEM - Cyber Security Institute Kocaeli, TurkeyRobust Ensemble Classifier Combination Based on Noise Removal with One-Class SVM