Cluster Oriented Ensemble Classifier Impact of Multi-Cluster

Embed Size (px)

Citation preview

  • 8/6/2019 Cluster Oriented Ensemble Classifier Impact of Multi-Cluster

    1/36

    1

    Cluster Oriented Ensemble Classifier: Impact of Multi-cluster

    Characterisation on Ensemble Classifier Learning

    B. Verma and A. Rahman

    Centre of Intelligent and Networked Systems

    School of Computing Sciences, CQUniversity

    Rockhamton, Queensland 4702, Australia

    Email: [email protected], [email protected]

    Abstract

    This paper presents a novel cluster oriented ensemble classifier. The proposed ensemble

    classifier is based on original concepts such as learning of cluster boundaries by the base

    classifiers and mapping of cluster confidences to class decision using a fusion classifier. The

    categorised data set is characterised into multiple clusters and fed to a number of distinctive

    base classifiers. The base classifiers learn cluster boundaries and produce cluster confidence

    vectors. A second level fusion classifier combines the cluster confidences and maps to class

    decisions. The proposed ensemble classifier modifies the learning domain for the base

    classifiers and facilitates efficient learning. The proposed approach is evaluated on

    benchmark data sets from UCI machine learning repository to identify the impact of multi-

    cluster boundaries on classifier learning and classification accuracy. The experimental results

    and twotailed sign test demonstrate the superiority of the proposed cluster oriented ensemble

    classifier over existing ensemble classifiers published in the literature.

    Keywords: Ensemble classifier, clustering, classification, fusion of classifiers

    Digital Object Indentifier 10.1109/TKDE.2011.28 1041-4347/11/$26.00 2011 IEEE

    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING

    This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

  • 8/6/2019 Cluster Oriented Ensemble Classifier Impact of Multi-Cluster

    2/36

    2

    1. Introduction

    An ensemble classifier is conventionally constructed from a set of base classifiers that

    separately learn the class boundaries over the patterns in a training set. The decision of an

    ensemble classifier on a test pattern is produced by fusing the individual decisions of the base

    classifiers. Ensemble classifiers are also known as multiple classifier systems, committee of

    classifiers and mixture of experts [1]. An ensemble classifier produces more accurate

    classification than its individual counterparts provided the base classifier errors are

    uncorrelated [3].

    Contemporary ensemble generation techniques train the base classifiers on different subsets

    of the training data in order to make their errors uncorrelated. The different algorithms

    including bagging[4] and boosting [7] vary in terms of generating the training subsets for

    base classifier training. The decisions of the base classifiers are fused into a single decision

    by using either majority voting on discrete decisions [1] or algebraic combiners [15] on

    continuous valued confidence measures. Although the contemporary ensemble classifiers

    (detailed in Section 2) are capable of making the base classifier errors uncorrelated they fail

    to establish any mechanism to improve the learning domain of the individual base classifiers.

    To clarify this concern let us consider a real world data set with overlapping patterns from

    different classes. The learning of class boundaries between overlapping class patterns in such

    cases is a difficult problem. Excessive training of the base classifiers will lead to accurate

    learning of the decision boundary but resulting in overfitting thus misclassifying instances of

    test data. On the other hand learning generalized boundaries will avoid overfitting but at the

    cost of always misclassifying some overlapping patterns. This problem on learning the class

    boundaries of overlapping patterns remains inherent in all the base classifiers and is

    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING

    This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

  • 8/6/2019 Cluster Oriented Ensemble Classifier Impact of Multi-Cluster

    3/36

    3

    propagated to the decision fusion stage as well even though the base classifier errors are

    uncorrelated.

    We opt to bring in clustering at this point. Clustering is the process of partitioning a data set

    into multiple groups where each group contains data points that are very close in Euclidian

    space. The clusters have well defined and easy to learn boundaries. Lets assume that the

    patterns are labelled with their cluster number. Now if the base classifiers are trained on the

    modified data set they will learn the cluster boundaries. As the clusters have well defined

    easy to learn boundaries the base classifiers can learn them with high accuracy. Clusters can

    contain overlapping patterns from multiple classes. A fusion classifier can be trained to

    predict the class of a pattern from the predicted cluster. The proposed cluster oriented

    ensemble classifier is based on the above philosophy.

    With the aim to achieve better learning and improved accuracy of the ensemble classifier, in

    this paper we propose an ensemble classifier approach that clusters classified data into

    multiple clusters, learns the decision boundaries between the clusters using a set of base

    classifiers and combine the cluster decisions produced by the base classifiers into class

    decision by a fusion classifier. Learning cluster boundaries leads to superior performance of

    the base classifiers. The fusion classifier maps the clustering pattern produced by the base

    classifiers into class decision. Altogether the ensemble of base and fusion classifiers aims

    better learning leading to higher classification accuracy as evidenced from the experimental

    results.

    While achieving the above mentioned aim, the research presented in this paper would like to

    find out the answers of four major research questions. The first research question is to

    investigate the performance of different clustering approaches namely heterogeneous

    clustering (i.e. clustering all the patterns from different classes) and homogeneous clustering

    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING

    This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

  • 8/6/2019 Cluster Oriented Ensemble Classifier Impact of Multi-Cluster

    4/36

    4

    (i.e. clustering patterns within a class). The second research question is to investigate whether

    the ensemble classifier outperforms the base classifiers significantly. The third research

    question is to find out the impact of fusion classifier. The final research question is to find the

    standing of the proposed ensemble classifier with respect to other ensemble classifiers on

    benchmark data sets.

    This paper is organized as follows. Section 2 presents the literature review. The proposed

    ensemble classifier is discussed in Section 3 and the methodology is presented in Section 4.

    Section 5 describes the experimental setup used for evaluating the proposed approach.

    Section 6 presents the results and comparative analysis. Finally, Section 7 concludes the

    paper.

    2. Literature Review

    The major concentration of ensemble classifier research [1][2] is on (i) generation of base

    classifiers for achieving diversity among them, and (ii) methods for fusing the decision of the

    base classifiers. Two classifiers are diverse if they make different errors on different instances.

    The ultimate objective of diversity is to make the base classifiers as unique as possible with

    respect to misclassified instances. We present a review of the contemporary ensemble

    classifiers related to the proposed approach in this section.

    Bagging [4][6] is a sampling based ensemble classifier generation approach that was

    introduced by Breiman. Bagging generates the multiple base classifiers by training them on

    data subsets randomly dawn (with replacement) from the entire training set. The decisions of

    the base classifiers are combined into the final decision by majority voting. The sampling

    procedure of bagging creates the various training subsets by bootstrap sampling which results

    in the diversity among the base classifiers. Bagging is suitable for small data sets. For large

    data sets however the sampling scheme based on the bootstrap with replicates of the training

    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING

    This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

  • 8/6/2019 Cluster Oriented Ensemble Classifier Impact of Multi-Cluster

    5/36

    5

    set is infeasible. Moreover, the randomness introduced by the sampling process in bagging

    cannot guarantee the performance of the overall ensemble classifier. A number of variations

    to bagging are observed in the literature to improve its performance and the list includes

    random forests [5], ordered aggregation [11], adaptive generation and aggregation approach

    [14], and fuzzy bagging [13].

    Schapire proposed a method called boosting[7][8] that creates data subsets for base classifier

    training by re-sampling the training data, however, by providing the most informative

    training data for each consecutive classifier. In boosting each of the training instances is

    assigned a weight that determines how well the instance was classified in the previous

    iteration. The subset of the training data that is badly classified (i.e. instances with higher

    weights) are included in the training set for the next iteration. This way boosting pays more

    attention to instances that are hard to classify. Although boosting identifies difficult to

    classify instances it does not provide any mechanism to improve the learning of base

    classifiers on these instances. The problem of base classifier learning that is raised by

    overlapping patterns still remains (as mentioned in the previous section), and leads to poor

    base classifier performance. A number of variants of boosting can be observed in the

    literature including boosting recombined weak classifiers [12], weighted instance selection

    [10], Learn++ [20] and its variant Learn++.NC [21].

    Random subspace [9] is an ensemble creation method that uses feature sub sets to create the

    different data subsets to train the base classifiers. Maclin and Shavlik proposed a neural

    ensemble [22] where a number of new approaches are presented to initialise the network

    weights in order to achieve diversity and generalization. Pujol and Masip presented a binary

    discriminative learning technique [23] based on the approximation of the non-linear decision

    boundary by a piece-wise linear smooth additive model. Chaudhuri et. al. presented a hybrid

    ensemble model [24] that combines the strengths of parametric and nonparametric

    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING

    This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

  • 8/6/2019 Cluster Oriented Ensemble Classifier Impact of Multi-Cluster

    6/36

    6

    classifiers. In recent times there are some works relating to cluster ensembles that aims to

    obtain improved clustering of the data set by combining multiple partitioning of the data set

    [25]. Note that the focus of ensemble classifier is to obtain improved classification accuracy

    that is significantly different from cluster ensembles that aims to achieve improved clustering

    accuracy.

    The other key aspect of ensemble classifier is the fusion of base classifier outputs into class

    decisions. The mapping can be done on discrete class decisions or continuous class

    confidence values produced by the base classifiers. The commonly used fusion methods [1]

    for combining class labels are majority voting, weighted majority voting, behaviour

    knowledge space, and Borda count. The commonly used fusion methods for combining

    continuous outputs are algebraic combiners [15] including mean rule, weighted average,

    trimmed mean, min/max/median rule, product rule, and generalized mean. A number of other

    fusion rules include decision template [16], pair-wise fusion matrix [17], adaptive fusion

    method [18], and nonBayesian probabilistic fusion [19]. Note that all these approaches are

    designed to fuse the class decisions from the base classifiers into a single class decision.

    Summarizing, the contemporary ensemble classifier generation methods are able to produce

    diversity among the base classifiers by making their errors uncorrelated. It however does not

    provide any mechanism to improve the learning process of the individual base classifiers on

    difficult to classify overlapping patterns. The proposed ensemble classifier aims to address

    this issue by creating multiple boundaries through data clustering, training the base classifiers

    on easy to learn cluster boundaries and handling the cluster to class mapping process by a

    fusion classifier. The overall philosophy of the proposed approach is presented in the

    following section.

    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING

    This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

  • 8/6/2019 Cluster Oriented Ensemble Classifier Impact of Multi-Cluster

    7/36

    7

    3. The Proposed Ensemble Classifier

    3.1 Motivation

    The decision boundaries in real world data sets are not simple. This is primarily because of

    overlapping patterns from different classes in the data set. As a result the learning of decision

    boundaries in such data sets leads to either overfitting or poor generalization. In both cases it

    causes classification errors. The situation is explained in Figure 1. The data set in Figure 1(a)

    contains overlapping patterns from two classes. Accurate learning from the training data by a

    generic classifier will result in class boundaries in Figure 1(b) leading to overfitting and thus

    misclassification of test data. An alternate solution to the problem can be achieved by

    reducing penalties for misclassification during training. In this case the generic classifier will

    learn simple decision boundaries (Figure 1(c)) but will cause misclassification of training as

    well as test data.

    This is the point where we would like to introduce multiple decision boundaries for each

    class through clustering. Clustering is the process of grouping similar patterns. Clustering the

    data set in Figure 1(a) with overlapping patterns will result in smaller groups of patterns as in

    Figure 1(d). Note that the cluster boundaries (Figure 1(e)) are simple and easy to learn. A

    generic classifier, if trained, now learns simple cluster boundaries that neither causes

    overfitting nor extreme generalization. Cluster to class mapping can be done by a fusion

    classifier. The underlying theoretical model and the methodologies of the proposed ensemble

    classifier are based on the above fact and are presented bellow.

    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING

    This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

  • 8/6/2019 Cluster Oriented Ensemble Classifier Impact of Multi-Cluster

    8/36

    8

    Class 1

    Class 2

    Test Case Class 2

    Class 1

    Class 2

    Test Case Class 2

    (a)

    Class 1

    Class 2

    Test Case Class 2

    Class 1

    Class 2

    Test Case Class 2

    (b)

    Class 1

    Class 2

    Test Case Class 2

    lass 1

    lass 2

    Test Case Class 2

    (c)

    Class 1

    Class 2

    Test Case Class 2

    lass 1

    lass 2

    Test Case Class 2

    (d)

    Class 1

    Class 2

    Test Case Class 2

    Class 1

    Class 2

    Test Case Class 2

    (e)

    Figure 1: Impact of clustering on an example data set consisting of two classes. (a) The

    original data set with overlapping patterns, (b) Overfitting caused by accurate learning of the

    decision boundaries, (c) Generalized decision boundary with overlapping patterns of class

    two considered as part of class one, (d) Clustered data set, and (e) Decision boundaries

    learned on clustered data set.

    3.2 Ensemble Classifier Model

    Let the ensemble classifier is composed of a set of

    base classifiers

    ,

    , ,

    and a

    fusion classifier. Given a pattern the ensemble classifier can be defined to achieve thefollowing mapping:

    () = [, , ] (1)where , , are class confidence values for the classes. The base and fusion classifierscombine to achieve the above mapping.

    Assuming that the data set is partitioned into clusters, each pattern belongs to a cluster. The base classifier is set to map the input pattern to a set of cluster confidence measures, , as

    (

    ) = [

    , ,

    ]. (2)

    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING

    This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

  • 8/6/2019 Cluster Oriented Ensemble Classifier Impact of Multi-Cluster

    9/36

    9

    The training set of a base classifier is made of pairs (, [, ,]) where representsthe input and [, ,] represents the target. Given that belongs to clusterk the targetcluster confidence vector is set as

    = 10

    = . (3)

    The base classifier parameters are tuned to optimization such that = argmin (), [, ,](,[,,]) (4)

    where is the error function. Let () = [, , ]. The error function for the baseclassifier is defined as

    = | | . (5)Given the cluster confidence vectors produced by the base classifiers the fusion classifier

    performs the following mapping

    [, ,], , [, ,] = [, , ] (6)where the , , are the class confidence measures produced by base classifier and, , are class confidence values. The training set for the fusion classifier is composedof pairs , [, , ] where is the cluster confidence vector and [, , ] is the targetclass confidence vector. A cluster can contain patterns from multiple classes and in that case

    a unique mapping is not possible by the fusion classifier. Depending on the number of classes

    , each class deserves a share of the cluster. There are thus a total of outputs/targets ofthe fusion classifier each representing a class and each target receives a weight during

    training according to the proportion of its patterns in the cluster. Let the cluster confidence

    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING

    This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

  • 8/6/2019 Cluster Oriented Ensemble Classifier Impact of Multi-Cluster

    10/36

    10

    vectors produced by the base classifiers in (6) correspond to clusterkthat contains patternsof classj where 1 . The target class confidence for thejth class is set as

    = . (7)The parameters for the fusion classifier are optimized such that

    = argmin (), [, , ],[,,] (8)where is the error function. Assuming () = [, , ] the error function isdefined as:

    = . (9)Using (2) and (6), the ensemble classifier mapping in (1) can be enumerated as:

    () = () ()= [, ,] [, ,]= , ,

    (10)

    The proposed ensemble classifier is based on the above model and corresponding architecture

    is presented in Figure 2.

    The objective of the proposed Cluster Oriented Ensemble Classifier (COEC) is to improve

    1

    bcN

    2

    Kww 111 ,,

    Kww 221 ,,

    KNN bcbcww ,,1

    cNtt ,,1

    x

    Input Base classifier Fusion classifier Cluster Confidence Vector Class Confidence Vector

    Figure 2: Architecture of the proposed ensemble classifier

    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING

    This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

  • 8/6/2019 Cluster Oriented Ensemble Classifier Impact of Multi-Cluster

    11/36

    11

    the learning process as well as the overall prediction accuracy by partitioning the data set,

    learning cluster boundaries by the base classifiers and mapping base classifiers output to

    class confidence vector using a fusion classifier. The novelty of the proposed method lies in:

    (i) Partitioning classified data into multiple clusters for achieving better separation.

    (ii) Use of base classifiers in an ensemble to learn clusterboundaries.

    (iii) Fusion of cluster confidence values produced by the base classifiers into class

    confidence values by a fusion classifier.

    3.3 Clustering in COEC

    The learning of the base and fusion classifiers in COEC depends on multiple class boundaries

    produced by clustering. The outcome of the clustering algorithm depends on the similarity

    measure between the patterns and we have used Euclidian distance that computes thegeometric distance between two patterns =< ,, , > and

    = in ndimensional hyperspace. We performed two types of clustering

    in COEC:

    (i)Heterogeneous clustering to partition all the patterns in the training set independent of any

    knowledge of the class of the patterns.

    (ii) Homogeneous clustering for partitioning the patterns belonging to a single class only.

    Patterns belonging to each class are partitioned separately. The characteristics and outcome of the two types of clustering is significantly different and

    influence the accuracy of COEC as evidenced from the experimental results.

    Assuming a set of K clusters {,, ,} and the associated cluster centres{

    ,

    , ,

    } the clustering algorithm aims to minimize an objective function

    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING

    This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

  • 8/6/2019 Cluster Oriented Ensemble Classifier Impact of Multi-Cluster

    12/36

    12

    = (,) , (11)for the patterns in the corresponding training set. Considering an augmented training set defined as = {(, ), (, ) , , ( , )} where {,, ,} , a genericclassifier learns the decision boundaries between the clusters and produces clusterconfidence vector, ,. The fusion classifier maps cluster confidence vector to classconfidence vector[, , ].The performance of the fusion classifier depends on the content of the cluster. If all the

    patterns in the cluster belong to the same class the mapping is unique. We refer to these

    clusters as atomic clusters. Nonatomic clusters are composed of patterns from different

    classes. The target vector of the fusion classifier for these clusters is set according to the

    proportion of patterns from different classes during training as mentioned in (7).

    4. Learning and Prediction Methodology of COEC

    The overall learning and prediction methodology of COEC is presented in Figure 3 and

    Figure 4. The learning process is depicted in Figure 3 where the training data is first clustered

    and the base classifiers then learn the mapping from patterns to clusters. The cluster

    confidence values produced by the different base classifiers are then merged to form the

    inputs for the fusion classifier and the targets are set to the original class values for learning

    the cluster to class map. During prediction (Figure 4), the base classifiers produce cluster

    confidence vectors for a test pattern. These vectors are merged to form the input for the

    fusion classifier that produces the class confidence vector.

    The different steps of learning and prediction of the ensemble of classifiers are detailed in the

    following sections.

    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING

    This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

  • 8/6/2019 Cluster Oriented Ensemble Classifier Impact of Multi-Cluster

    13/36

    13

    Figure 3: Training process for COEC

    Figure 4: Test process for COEC

    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING

    This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

  • 8/6/2019 Cluster Oriented Ensemble Classifier Impact of Multi-Cluster

    14/36

    14

    4.1 Homogeneous/Heterogeneous Clustering

    The learning process starts by partitioning the training data into multiple clusters. Given the

    training data set [] = [] [] where 1 and 1 , the purpose of the clustering algorithm is to partition the training data set into a number of clusters. The output of the clustering algorithm is the modified data set [] =[] []. Given the training data set, the clustering algorithm is presented in Figure5. At the completion of clustering each row of[] is augmented with cluster id producing[

    ] = [

    ]

    [

    ].

    The output of the clustering algorithm depends on the input argument type. We have used two

    types of clustering in COEC (i) Homogeneous clustering: Clustering is performed

    separately on the patterns belonging to the same class, and (ii) Heterogeneous clustering:

    Clustering is performed on the entire data set. We have reported our findings on both of these

    clustering approaches in Section 6.

    Figure 5: Homogeneous/Heterogeneous Clustering algorithm for partitioning classified data

    into multiple clusters.

    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING

    This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

  • 8/6/2019 Cluster Oriented Ensemble Classifier Impact of Multi-Cluster

    15/36

    15

    4.2 Base Classifier Training

    A set of base classifiers are trained with [] = [] [] as produced by theclustering algorithm. The input to each base classifier is set to []. The target for each baseclassifier is set to [] such that

    = 10

    = , (12)

    where 1 . The aim of training the base classifiers with the target clustermatrix is that during prediction the base classifiers produce cluster confidence values for a

    pattern. The training parameters for each base classifier are optimized to fit the training data.

    The training algorithm for a generic classifier is presented in Figure 6. At the completion of

    training, for each base classifier a model is obtained where 1 and [] is presented to each of the base classifiers producing a set of cluster confidence matrices{[ ]} for the training patterns where 1 , and 1 .

    Figure 6: Base classifier training algorithm.

    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING

    This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

  • 8/6/2019 Cluster Oriented Ensemble Classifier Impact of Multi-Cluster

    16/36

    16

    4.3 Fusion Classifier Training

    The confidence matrices produced by the base classifiers are combined to form the input to

    the fusion classifier where 1 and 1 . The target matrix for is composed of class confidence vectors that are set according to the proportion of classinstances within the cluster. The parameters for fusion classifier are optimized to fit theabove input-output pattern produced by the training examples. At the completion of training a

    model for the ensemble classifier is obtained. The training algorithm for the fusionclassifier is presented in Figure 7.

    Figure 7: Fusion classifier training algorithm.

    4.4 Prediction

    The test pattern =< , , > is presented to each of the base classifiers. Each base classifier produces different confidence values < , , > thatindicate the possibility of the pattern belonging to the different clusters. The cluster

    confidence vectors produced by the different base classifiers are combined to produce

    < , , , , , , > that forms the input to the fusion classifier. Atoutput the fusion classifier produces the class confidence values < , , > thatindicate the possibility of the example belonging to different classes. The ensemble classifier

    prediction algorithm is presented in Figure 8.

    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING

    This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

  • 8/6/2019 Cluster Oriented Ensemble Classifier Impact of Multi-Cluster

    17/36

    17

    Figure 8: Ensemble classifier prediction algorithm.

    5. Experimental Setup

    We have conducted a number of experiments on benchmark data sets from UCI machine

    learning repository [27] to verify the strength of COEC and investigate the research questions

    mentioned in Section 1. We have used the same data sets as used in recently published

    research [10][12][17] so that the results can be easily compared. A summary of the data sets

    is presented in Table 1. The Wine data set has well defined training and test sets so as

    directed by the description of the data set [27], we have used it as it is. We have used 10fold

    cross validation for reporting the classification results for all the other data sets.

    Table 1: Data sets used in the experiments.

    Dataset # instances # attributes # classes

    Breast Cancer(Wisconsin) 699 10 2Sonar 208 60 2Iris 150 4 3

    Ionosphere 351 34 2

    Thyroid(New) 215 5 3Vehicle 946 18 4Liver 345 7 2Diabetes 768 8 2Wine 178 13 3Satellite 6435 36 6Segment 2310 19 7

    We used the k means clustering algorithm [26] for partitioning the data sets. Two types of

    clustering were performed: (i) heterogeneous clustering: conventional clustering of the entire

    data set into k clusters where a cluster can contain examples of more than one class. The

    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING

    This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

  • 8/6/2019 Cluster Oriented Ensemble Classifier Impact of Multi-Cluster

    18/36

    18

    target for the fusion classifier is set as per the proportions of the class examples within each

    cluster; (ii) homogeneous clustering: examples of a single class are partitioned into kclusters.

    The target of the fusion classifier is set to the class for which the clustering is performed. We

    have reported the impact of both types of clustering on ensemble classifier accuracy and

    analysed for their superiority.

    We have investigated the proposed ensemble classifier by incorporating three well known

    and distinct classifiers such as k Nearest Neighbour (k NN), Neural Network (NN), and

    Support Vector Machine (SVM) as the base classifiers. A Neural Network is used as the

    fusion classifier. The neural networks for small data sets are trained using a single hidden

    layer and tan sigmoid activation functions for the neurons. The LevenbergMarquardt

    backpropagation method is used for learning of the weights in these cases. Larger data sets

    are however learned with log sigmoid activation function and gradient descent training

    function. We have used the radial basis kernel for SVM and the libsvm library [28] in all the

    experiments. The different parameters for the classifiers (e.g. kin kNN classifier,sigma in

    RBF kernel of SVM, and epochs, RMS error goal, learning rate in neural network) were

    hand tuned for different data sets. The classification accuracies of bagging, boosting and

    random subspace on the data sets in Table 1 are obtained from [17] and WEKA [31]. All the

    experiments were conducted on MATLAB 7.5.0.

    6. Results and Discussion

    6.1 Heterogeneous and Homogeneous Clustering

    6.1.1 Heterogeneous clustering

    Given a set of training examples the heterogeneous clusteringpartitions the entire data set. In

    a data set where examples of different classes are well separated in Euclidian space,

    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING

    This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

  • 8/6/2019 Cluster Oriented Ensemble Classifier Impact of Multi-Cluster

    19/36

    19

    heterogeneous clustering will produce partitions each containing examples from one class

    only. We use the term atomic clusterto refer to a partition containing examples from a single

    class. Most of the real world data sets however contain overlapping examples from different

    classes. It is thus likely to observe mostly nonatomic clusters (clusters containing examples

    from multiple classes) when the data set is partitioned using heterogeneous clustering where

    the number of clusters equals the number of classes. Figure 9 represents a set of co

    occurrence matrices that are obtained from different data sets by counting the number of

    instances of each class belonging to a particular cluster when the data sets are partitioned into

    kclusters using kmeans clustering with k=# of classes.

    Class

    Cluster

    1 2

    1 61 82

    2 141 31

    Ionosphere

    Class

    Cluster

    1 2

    1 30 32

    2 57 68

    Sonar

    Class

    Cluster

    1 2 3

    1 0 42 45

    2 21 3 0

    3 24 0 0

    Iris

    Class

    Cluster

    1 2 3

    1 0 32 0

    2 30 1 0

    3 0 3 24

    Wine

    Figure 9: ClusterClass cooccurrence matrix when heterogeneous clustering is performed

    on the data sets using kmean clustering algorithm where k= # of classes.

    Note from Figure 9 that in Ionosphere and Sonardata sets each cluster contains examples

    from multiple classes. This implies overlapping data points in these data sets. Nearly atomic

    and atomic clusters are obtained for the Iris data set at the second and third clusters

    respectively. The first cluster however contains overlapping examples from class 2 and class

    3. Clustering these data sets into higher number of partitions will lead to higher number of

    atomic or nearly atomic clusters leading to better learning of the ensemble classifier. The

    clusters produced for the Wine data set are however either atomic or nearly atomic. It is easier

    to produce the cluster to class mapping for these clusters by the fusion classifier in COEC.

    Clustering further is unlikely to provide any benefit for the ensemble classifier learning for

    such data sets. Figure 10 represents the cooccurrence matrices when theIonosphere, Sonar

    and Iris data sets are partitioned into higher number of clusters.

    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING

    This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

  • 8/6/2019 Cluster Oriented Ensemble Classifier Impact of Multi-Cluster

    20/36

    20

    It can be observed from Figure 10 that the higher number of clusters improves the learning

    scenario for all the data sets. Six out of ten clusters in theIonosphere data set are atomic and

    two clusters are near atomic. Four clusters are atomic and three clusters are near atomic for

    Sonardata set. All the clusters are either atomic or near atomic forIris data set. These results

    imply that higher number of clusters in heterogeneous clustering produce significant numbers

    of atomic and near atomic clusters and it becomes easier for the fusion classifier in COEC to

    produce the cluster to class map leading to better classification accuracy.

    Class

    Cluster

    1 2

    1 0 10

    2 35 0

    3 0 7

    4 108 26

    5 22 2

    6 20 38

    7 0 15

    8 17 1

    9 0 8

    10 0 6

    Ionosphere

    Class

    Cluster

    1 2

    1 17 26

    2 11 1

    3 0 11

    4 18 22

    5 17 5

    6 0 9

    7 4 7

    8 0 8

    9 20 5

    10 0 6

    Sonar

    Class

    Cluster

    1 2 3

    1 0 24 2

    2 0 0 10

    3 0 20 2

    4 12 0 0

    5 0 1 16

    6 9 0 0

    7 0 0 15

    8 9 0 0

    9 15 0 0

    Iris

    Figure 10: ClusterClass cooccurrence matrix when heterogeneous clustering is

    performed on the data sets using k mean clustering algorithm with higher number of

    clusters.

    Figure 11 presents the classification accuracies of the datasets in Table 1 at different number

    of clusters using heterogeneous clustering in COEC. The best classification accuracies are

    obtained for all the data sets at number of clusters greater than the number of classes. As the

    clusters have well defined boundaries the base classifiers learn cluster boundaries easily.

    Higher number of clusters produces mostly atomic and nearatomic clusters for data sets like

    Iris, Ionosphere and Sonar(Figure 10). As a result the fusion classifiers learn the cluster to

    class maps with high accuracy resulting in better classification performance of the COEC.

    Data sets like Wine has class patterns that are already well separated (Figure 9) and further

    clustering does not significantly improve the classification performance of the COEC.

    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING

    This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

  • 8/6/2019 Cluster Oriented Ensemble Classifier Impact of Multi-Cluster

    21/36

    21

    (a)Breast cancer (b) Sonar (c)Iris

    (d)Ionosphere (e) Thyroid (f) Vehicle

    (g)Liver (h)Diabetes (i) Wine

    (j) Satellite (k) Segment

    Figure 11: Heterogeneous clustering in COEC at different number of clusters on the test cases

    of the datasets in Table 1.

    6.1.2 Homogeneous clustering

    Homogeneous clustering partitions the examples belonging to single class only and ignores

    the instances of other classes. Consider the partitioning of the data sets in Figure 9 using

    homogeneous clustering. The resultant clusterclass cooccurrence matrices are represented

    in Figure 12 considering two clusters for each class. The total number of clusters equals the

    number of classes time the number of clusters per class. Note that all the clusters are atomic

    in nature.

    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING

    This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

  • 8/6/2019 Cluster Oriented Ensemble Classifier Impact of Multi-Cluster

    22/36

    22

    Class

    Cluster

    1 2

    1 59 0

    2 143 0

    3 0 82

    4 0 31

    Ionosphere

    Class

    Cluster

    1 2

    1 51 0

    2 36 0

    3 0 61

    4 0 39

    Sonar

    Class

    Cluste

    r

    1 2 3

    1 21 0 0

    2 24 0 0

    3 0 23 0

    4 0 22 0

    5 0 0 20

    6 0 0 25

    Iris

    Class

    Cluste

    r

    1 2 3

    1 11 0 0

    2 19 0 0

    3 0 18 0

    4 0 18 0

    5 0 0 15

    6 0 0 9

    Wine

    Figure 12: ClusterClass cooccurrence matrix when homogeneous clustering is

    performed on the data sets using kmean clustering algorithm with two classes for each

    cluster.

    Figure 13 represents the classification performance of COEC at different number of clusters

    on the data sets in Table 1 using homogeneous clustering. Here n clusters imply a total of

    nnumber_of_classes clusters in the data set. For example, the Vehicle data set has four

    classes and the four clusters in Figure 13 means 44=16 clusters in the data set. Too many

    clusters in small data sets imply small number of patterns in each cluster that leads to poor

    learning of the fusion classifier in COEC. This explains the fall of accuracy at higher number

    of clusters for majority of the data sets in Figure 13.

    6.1.3 Comparison

    Homogeneous clustering can be beneficial over heterogeneous clustering for overlapping

    patterns. For clarification, consider an artificial data set in Figure 14. The data set contains

    overlapping patterns from multiple classes. Heterogeneous clustering is likely to produce the

    partitions presented in Figure 14(b) where a large cluster is nonatomic. Even with higher

    number of partitions the situation is unlikely to change or the produced clusters will be

    random with each being nonatomic. The partitions produced by homogeneous clustering

    under identical situation are presented in Figure 14(c). Note that all the clusters are atomic in

    nature. The groups within each cluster are well separated geometrically for the data set. As

    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING

    This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

  • 8/6/2019 Cluster Oriented Ensemble Classifier Impact of Multi-Cluster

    23/36

    23

    data is clustered class wise the cluster to class mapping becomes easier by the fusion

    classifier. COEC thus performs better using homogeneous clustering.

    (a)Breast cancer (b) Sonar (c)Iris

    (d)Ionosphere (e) Thyroid (f) Vehicle

    (g)Liver (h)Diabetes (i) Wine

    (j) Satellite (k) Segment

    Figure 13: Homogeneous clustering in COEC at different number of clusters on the test cases

    of the datasets in Table 1.

    Class 1

    Class 2

    Class 1

    Class 2

    (a) Data Set

    Class 1

    Class 2

    Class 1

    Class 2

    (b) Heterogeneous clustering

    Class 1

    Class 2

    Class 1

    lass 2

    (c) Homogeneous clustering

    Figure 14: Clustering of an artificial data set with overlapping data points using

    homogeneous and heterogeneous clustering.

    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING

    This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

  • 8/6/2019 Cluster Oriented Ensemble Classifier Impact of Multi-Cluster

    24/36

    24

    To verify the above observation we have conducted a set of classification experiments on the

    data sets in Table 1 using both homogeneous and heterogeneous clustering with COEC. The

    10fold cross validation results on the test sets are presented in Table 2. It can be observed

    that homogeneous clustering performs 14.38% better than heterogeneous clustering on an

    average with COEC. These real world data sets contain significantly overlapping patterns and

    the performance of homogeneous clustering is better than that of heterogeneous clustering as

    evidenced from Table 2. To validate this claim, we define the null and alternative hypothesis

    as follows:

    Null Hypothesis: Homogeneous clustering is equivalent to heterogeneous clustering for

    classifying data using COEC.

    Alternative Hypothesis: Homogeneous clustering is significantly better than heterogeneous

    clustering for classifying data using COEC.

    Note that the Null Hypothesis is rejected at 0.05 significance level by twotailed sign test

    [29][30] from the comparative classification performances of heterogeneous and

    homogeneous clustering in Table 2.

    Table 2: Classification performance comparison of COEC at homogeneous clustering and

    heterogeneous clustering on the test cases of the data sets in Table 1 using 10fold cross

    validation. Thesign teston the results implies that homogeneous clustering is significantly

    better than heterogeneous clustering with COEC.

    Data Set Heterogeneous clustering Homogeneous clustering

    Breast Cancer 97.591.29 97.722.23

    Sonar 67.299.90 84.447.60

    Iris 95.335.49 96.003.44

    Ionosphere 86.557.03 89.095.49

    Thyroid 86.069.55 94.896.20

    Vehicle 52.154.45 71.772.99

    Liver 57.676.28 63.339.05

    Diabetes 64.83.50 71.085.65

    Wine 98.100.00 99.050.00

    Satellite 76.084.97 89.191.22

    Segment 66.935.07 95.971.08

    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING

    This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

  • 8/6/2019 Cluster Oriented Ensemble Classifier Impact of Multi-Cluster

    25/36

    25

    Note that the performance of COEC with clustering depends on the number of clusters. The

    main objective of this paper is to observe the influence of clustering on classification

    accuracy. We have adopted a step wise search method by changing the number of clusters

    within a limited range and observing its influence on classification accuracy. The actual

    number of clusters is a function of the number of patterns in the data set and it is thus

    required that a wider range of number of clusters be considered for finding the optimal

    number of clusters at which the classification accuracy is maximum. Further research is

    required for finding the optimal number of clusters.

    6.2 Impact of Clusters on Diversity

    In order to ascertain the impact of clusters on diversity we have computed the errors made by

    the base classifiers as we change the number of clusters in COEC. Figure 15 represents the

    errors made by kNN, NN and SVM base classifiers as the number of clusters change. Note

    that the base classifier errors at each cluster are different for all the data sets. This is possible

    only if the base classifiers make different errors on identical patterns. This implies that the

    errors made by the base classifiers are not correlated which in turn refer to the diversity

    among the base classifiers.

    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING

    This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

  • 8/6/2019 Cluster Oriented Ensemble Classifier Impact of Multi-Cluster

    26/36

    26

    (a)Breast cancer (b) Sonar (c)Iris

    (d)Ionosphere (e) Thyroid (f) Vehicle

    (g)Liver (h)Diabetes (i) Wine

    (j) Satellite (k) Segment

    Figure 15: Change in errors made by base classifiers as the number of clusters change in COEC.

    The errors are normalized within a range of zero to one.

    6.3 Comparative Performance Analysis of COEC and Base Classifiers

    Table 3 represents a comparative analysis of the classification performance of COEC and the

    corresponding base classifiers. Note that different base classifiers achieve different accuracies

    on the data sets. This indicates the fact that the errors made by the base classifiers are

    different and diversity among the base classifiers in achieved in COEC. On an average COEC

    performs 3.92% better than kNN, 7.26% better than NN and 7.39% better than SVM as the

    base classifiers. The fusion classifier mingles the decisions from the base classifiers to find

    the best possible verdict and this can be attributed to the better performance of COEC. In

    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING

    This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

  • 8/6/2019 Cluster Oriented Ensemble Classifier Impact of Multi-Cluster

    27/36

    27

    order to validate the claims we define the null and alternative hypothesis for each classifier

    pair in Table 4. Note that the null hypothesis is rejected at 0.05 significance level by two

    tailed sign testfor each classifier pair in Table 4 implying that COEC performs significantly

    better than the corresponding base classifiers.

    Table 3: Classification performance comparison between COEC and the

    corresponding base classifiers.

    Data Set kNN NN SVM COEC

    Breast Cancer 96.78 95.09 92.02 97.72Sonar 81.49 73.09 55.04 84.44Iris 94 93.33 93.33 96.00

    Ionosphere 80.66 77.69 84.66 89.09

    Thyroid 85.17 91.33 93.83 94.89Vehicle 68.71 65.95 68.31 71.77Liver 61.08 62.58 61.75 63.33Diabetes 70.29 61.27 70.64 71.08Wine 97.14 93.50 96.07 99.05Satellite 87.45 83.23 88.89 89.19Segment 94.76 94.98 95.28 95.97

    Table 4: Significance test for comparing the classification performance of COEC

    and the corresponding base classifiers usingsign test.

    Classifier pair Hypothesis Test

    COEC vs. kNN Null Hypothesis: COEC is equivalent to base kNN classifier

    Alternative Hypothesis: COEC is significantly better than the base kNN classifierSign-Test: Null Hypothesis rejected at 0.05 significance level from the comparativeclassification performances of COEC and kNN in Table 3

    COEC vs. NN Null Hypothesis: COEC is equivalent to base NN classifierAlternative Hypothesis: COEC is significantly better than the base NN classifierSign-Test: Null Hypothesis rejected at 0.05 significance level from the comparativeclassification performances of COEC and NN in Table 3

    COEC vs. SVM Null Hypothesis: COEC is equivalent to base SVM classifierAlternative Hypothesis: COEC is significantly better than the base SVM classifierSign-Test: Null Hypothesis rejected at 0.05 significance level from the comparativeclassification performances of COEC and SVM in Table 3

    We also conducted a classification experiment of the entire data set with the base classifiers

    only without any clustering. The classification results are presented in Table 5. COEC

    performs 3.62% better than k NN, 5.33% better than NN and 6.51% better than SVM

    classifiers. This implies that clustering has significant impact on the learning of the ensemble

    classifier (Section 6.1) leading to overall better performance. We justify this claim by

    defining the null and alternative hypothesis in Table 6 for each pair of classifiers. Note that

    the null hypothesis is rejected at 0.05 significance level using twotailed sign test for each

    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING

    This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

  • 8/6/2019 Cluster Oriented Ensemble Classifier Impact of Multi-Cluster

    28/36

    28

    classifier pair. This implies that clustering significantly impacts the learning in COEC and

    improves the classification performance.

    Table 5: Classification performance comparison between COEC and individual

    classifiers with no clustering.

    Data Set k-NN NN SVM COEC

    Breast Cancer 96.78 95.09 92.02 97.72Sonar 80.53 70.89 53.29 84.44Iris 95.33 96 94.67 96.00Ionosphere 82.80 82.04 87.23 89.09

    Thyroid 88.5 87.11 93.83 94.89Vehicle 69.39 72.26 70.15 71.77

    Liver 59.08 61.67 66.92 63.33Diabetes 68.9 68.62 71.74 71.08Wine 97.14 93.50 96.07 99.05

    Satellite 87.45 83.23 88.89 89.19Segment 95.24 95.46 93.33 95.97

    Table 6: Significance test for comparing the pairwise classification performance

    between COEC and the individual classifiers (without clustering) usingsign test.

    Classifier pair Hypothesis Test

    COEC vs. kNN Null Hypothesis: COEC is equivalent to kNN classifierAlternative Hypothesis: COEC is significantly better than the kNN classifierSign-Test: Null Hypothesis rejected at 0.05 significance level from the comparativeclassification performances of COEC and kNN in Table 5

    COEC vs. NN Null Hypothesis: COEC is equivalent to NN classifierAlternative Hypothesis: COEC is significantly better than the NN classifierSign-Test: Null Hypothesis rejected at 0.05 significance level from the comparativeclassification performances of COEC and NN in Table 5

    COEC vs. SVM Null Hypothesis: COEC is equivalent to SVM classifier

    Alternative Hypothesis: COEC is significantly better than the SVM classifierSign-Test: Null Hypothesis rejected at 0.05 significance level from the comparativeclassification performances of COEC and SVM in Table 5

    6.4 Comparative Performance Analysis of Classifier Fusion and Algebraic Fusion

    Conventional algebraic fusion methods fuse the class confidence values produced by the base

    classifiers to produce the class confidence values of the ensemble classifier. In COEC the

    base classifiers produce cluster confidence values. If conventional algebraic methods (e.g.

    mean of confidence values) are used in COEC the cluster confidence values will be produced

    for the ensemble classifier. The clustertoclass mapping can then be obtained using majority

    voting. The class having maximum number of patterns in the cluster will win the vote. This

    process is not suitable for strong nonatomic clusters as it undermines the class patterns

    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING

    This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

  • 8/6/2019 Cluster Oriented Ensemble Classifier Impact of Multi-Cluster

    29/36

    29

    significantly present in the cluster but not in maximum. This will thus impact the overall

    classification accuracy. A fusion classifier will perform better under this circumstance. The

    targets of the classifier are set according to proportions of class patterns and trained

    accordingly. The fusion classifier thus gives importance to all the classes in a cluster

    according to their proportion whereas the majority voting undermines that.

    Table 7 provides a comparative classification performance of fusion classifier and algebraic

    fusion (mean confidence for cluster and majority voting for class) while used with COEC.

    Overall the fusion classifier performs 1.08% better than algebraic fusion. This implies that

    the use of fusion classifier significantly improves the performance of COEC compared to

    algebraic fusion. To justify this claim we define the following null and alternative hypothesis:

    Null Hypothesis: Fusion classifier approach is equivalent to algebraic fusion approach while

    used with COEC

    Alternative Hypothesis: Fusion classifier approach is significantly better than the algebraic

    fusion approach while used with COEC

    Note that the null hypothesis rejected at 0.05 significance level by twotailed sign testfrom

    the comparative classification performances presented in Table 7.

    Table 7: Classification performance comparison between algebraic fusion and classifier

    fusion in COEC.

    Data Set Algebraic fusion Classifier fusion

    Breast Cancer 97.61 97.72Sonar 83.89 84.44Iris 95.33 96.00Ionosphere 84.00 89.09Thyroid 94.28 94.89Vehicle 70.86 71.77Liver 62.83 63.33Diabetes 70.46 71.08Wine 97.14 99.05Satellite 89.89 89.19Segment 96.36 95.97

    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING

    This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

  • 8/6/2019 Cluster Oriented Ensemble Classifier Impact of Multi-Cluster

    30/36

    30

    6.5 Comparative Performance Analysis of COEC and Classical Ensemble ClassifiersIn order to find the position of COEC we have classified the data sets using classical

    ensemble classifiers namely bagging, boosting, and random subspace method. Figure 16

    provides a summary of the classification accuracies obtained using COEC and other

    ensemble classifiers. On an average COEC performs 6.05% better than bagging, 8.20% better

    than boosting and 9.08% better than random subspace method. As mentioned in Section 2 the

    classical methods aim to achieve diversity and do not provide any mechanism to improve the

    learning performance of base classifiers. In COEC this issue is handled by first allowing the

    base classifier to learn cluster boundary. As clusters have well defined boundaries it is easier

    to learn by the base classifiers. The fusion classifier performs the cluster to class mapping and

    as observed in the previous section it performs better than the conventional fusion methods.

    This combination of cluster boundary learning and fusion classifier mapping leads to better

    performance of COEC. We justify this claim by conducting a sign test as presented in Table 8.

    Note that the null hypothesis is rejected in all cases either at 0.05 or 0.15 significance level

    indicating the fact that COEC performs significantly better than the conventional ensemble

    classifiers.

    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING

    This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

  • 8/6/2019 Cluster Oriented Ensemble Classifier Impact of Multi-Cluster

    31/36

    31

    Figure 16: Classification performance comparison between COEC and classical ensemble

    classifiers.

    Table 8: Significance test for comparing the pairwise classification performance

    between COEC and the classical ensemble classifiers usingsign test.

    Classifier pair Hypothesis Test

    COEC vs. bagging Null Hypothesis: COEC is equivalent to bagging

    Alternative Hypothesis: COEC is significantly better than baggingSign-Test: Null Hypothesis rejected at 0.05 significance level from the comparativeclassification performances of COEC and bagging in Figure 16

    COEC vs. boosting Null Hypothesis: COEC is equivalent to boostingAlternative Hypothesis: COEC is significantly better than boostingSign-Test: Null Hypothesis rejected at 0.05 significance level from the comparativeclassification performances of COEC and boosting in Figure 16

    COEC vs. randomsubspace method

    Null Hypothesis: COEC is equivalent to random subspace methodAlternative Hypothesis: COEC is significantly better than random subspace methodSign-Test: Null Hypothesis rejected at 0.15 significance level from the comparativeclassification performances of COEC and random subspace method in Figure 16

    7. Conclusion

    We have presented a novel cluster oriented ensemble classifier (COEC) which is based on

    learning of cluster boundaries by the base classifiers leading to better learning capability and

    clustertoclass mapping by a fusion classifier leading to better classification accuracy.

    The proposed COEC has been evaluated on benchmark data sets from UCI machine learning

    repository. The detailed experimental results and their significance using two-tailed sign test

    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING

    This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

  • 8/6/2019 Cluster Oriented Ensemble Classifier Impact of Multi-Cluster

    32/36

    32

    have been presented and analysed in Section 6. The evidence from the experimental results

    and twotailed sign test show that (i) homogeneous clustering performs significantly better

    than heterogeneous clustering with COEC. As shown in Section 6.1, overall the

    homogeneous clustering performs 14.38% better than heterogeneous clustering. (ii) the

    proposed COEC performs significantly better than its base counterparts. As shown in Section

    6.3, overall COEC performs 3.62% better than kNN, 5.33% better than NN and 6.51% better

    than SVM classifiers. (iii) fusion classifier performs significantly better than algebraic fusion

    with COEC. As shown in Section 6.4, overall the fusion classifier performs 1.08% better than

    algebraic fusion. (iv) COEC outperforms classical ensemble classifiers namely bagging,

    boosting and random subspace method significantly on benchmark data sets. As shown in

    Section 6.5, overall COEC performs 6.05% better than bagging, 8.20% better than boosting

    and 9.08% better than random subspace method.

    In our future research, we would like to focus on finding the optimal number of clusters and

    global optimization of the parameters of the base and fusion classifiers.

    References

    [1] R. Polikar, Ensemble based systems in decision making, IEEE Circuits and Systems

    Magazine, vol. 6, no. 3, pp. 2145, 2006.

    [2] R. Caruana and A. N. Mizil, An Empirical Comparison of Supervised Learning

    Algorithms, Proceedings of International Conference on Machine Learning (ICML), pp.

    161168, 2006.

    [3] T. Windeatt, Accuracy/Diversity and ensemble MLP classifier design, IEEE

    Transaction on Neural Networks, vol. 17, no. 5, pp. 11941211, 2006.

    [4] L. Breiman, Bagging predictors, Machine Learning, vol. 24, no. 2, pp. 123140, 1996.

    [5] L. Breiman, Random Forests, Machine Learning, vol. 45, no. 1, pp. 532, Oct. 2001.

    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING

    This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

  • 8/6/2019 Cluster Oriented Ensemble Classifier Impact of Multi-Cluster

    33/36

    33

    [6] G. Fumera, F. Roli and A. Serrau, A Theoretical Analysis of Bagging as a Linear

    Combination of Classifiers, IEEE Transaction on Pattern Analysis and Machine

    Intelligence, vol. 30, no. 7, pp. 12931299, 2008.

    [7] R. E. Schapire, The strength of weak learnability, Machine Learning, vol. 5, no. 2, pp.

    197227, 1990.

    [8] Y. Freund and R. E. Schapire, Decision-theoretic generalization of on-line learning and

    an application to boosting, Journal of Computer and System Sciences, vol. 55, no. 1, pp.

    119139, 1997.

    [9] R. E. Banfield, L. o. Hall, K. W. Bowyer, W. P. Kegelmeyer, A new ensemble

    diversity measure applied to thinning ensembles, International workshop on Multiple

    Classifier Systems (MCS), pp. 306316, 2003.

    [10]N. G. Pedrajas, Constructing ensembles of classifiers by means of weighted instance

    selection, IEEE Transaction on Neural Networks, vol. 20, no. 2, pp. 258277, 2009.

    [11]G. M. Munoz, D. H. Lobato, and A. Suarez, An analysis of ensemble pruning

    techniques based on ordered aggregation, IEEE Transaction on Pattern Analysis and

    Machine Intelligence, vol. 31, no. 2, pp. 245259, 2009.

    [12]J. J. Rodriguez and J. Maudes, Boosting recombined weak classifiers, Pattern

    Recognition Letters, vol. 29, pp. 10491059, 2008.

    [13]L. Nanni and A. Lumini, Fuzzy bagging: a novel ensemble of classifiers, Pattern

    Recognition, vol. 39, pp. 488490, 2006.

    [14]L. Chen and M. S. Kamel, A generalized adaptive ensemble generation and aggregation

    approach for multiple classifiers systems, Pattern Recognition, vol. 42, pp. 629644,

    2009.

    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING

    This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

  • 8/6/2019 Cluster Oriented Ensemble Classifier Impact of Multi-Cluster

    34/36

    34

    [15]J. Kittler, M. Hatef, R. P. W. Duin, and J. Matas, On combining classifiers, IEEE

    Transaction on Pattern Analysis and Machine Intelligence, vol. 20, no. 3, pp. 226239,

    1998.

    [16]L. I. Kuncheva, J. C. Bezdek, and R. Duin, Decision templates for multiple classifier

    fusion: An experimental comparison, Pattern Recognition, vol. 34, no. 2, pp. 299314,

    2001.

    [17]A. H. R. Ko, R. Sabourin, A. de S. Britto, and L. Oliveira, Pairwise fusion matrix for

    combining classifiers, Pattern Recognition, vol. 40, pp. 21982210, 2007.

    [18]N. M. Wanas, R. A. Dara, and M. S. Kamel, Adaptive fusion and co-operative training

    for classifier ensembles, Pattern Recogntion, vol. 39, pp. 17811794, 2006.

    [19]O. R. Terrades, E. Valveny, and S. Tabbone, Optimal classifier fusion in a non-

    Bayesian probabilistic framework, IEEE Transaction on Pattern Analysis and Machine

    Intelligence, vol. 31, no. 9, pp. 16301644, 2009.

    [20]D. Parikh and R. Polikar, Ensemble based incrimental learning approach to data fusion,

    IEEE Transaction on Systeams, Man, and Cybernetics, vol. 37, no. 2, pp. 437450, 2007.

    [21]M. D. Muhlbaier, A. Topalis, and R. Polikar, Learn++.NC: Combining ensemble of

    classifiers with dynamically weighted consult-and-vote for efficient incremental learning

    of new classes, IEEE Transaction on Neural Networks, vol. 20, no. 1, pp. 152168,

    2009.

    [22]R. Maclin and J. W. Shavlik, Combining the Predictions of Multiple Classifiers: Using

    Competitive Learning to Initialize Neural Networks, International Joint Conference on

    Artificial Intelligence, pp. 524531, 1995.

    [23]O. Pujol and D. Masip, Geometry-based ensembles: toward a structural characterization

    of the classification boundary, IEEE Transaction on Pattern Analysis and Machine

    Intelligence, vol. 31, no. 6, 11401146, 2009.

    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING

    This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

  • 8/6/2019 Cluster Oriented Ensemble Classifier Impact of Multi-Cluster

    35/36

    35

    [24]P. Chaudhuri, A. K. Ghosh, and H. Oja, Classification based on hybridization of

    parametric and non-parametric classifiers, IEEE Transaction on Pattern Analysis and

    Machine Intelligence, vol. 31, no. 7, pp. 11531164, 2009.

    [25]A. Strehl and J. Ghosh, Cluster ensembles a knowledge reuse framework for

    combining multiple partitions, The Journal of Machine Learning Research, vol. 3, pp.

    583617, 2003.

    [26]E. Forgy, Cluster analysis of multivariate data: Efficiency vs. interpretability of

    classifications, Biometrics, vol. 21, pp. 768780, 1965.

    [27]UCI Machine Learning Database, http://archive.ics.uci.edu/ml/, accessed on 10th

    February 2010.

    [28]LIBSVM, A library for support vector machines,

    http://www.csie.ntu.edu.tw/~cjlin/libsvm/, accessed on 10th February 2010.

    [29]J. Demsar, Statistical comparisons of classifiers over multiple data sets, Journal of

    Machine Learning Research, vol. 7, pp. 130, 2006.

    [30]D. J. Sheskin, Handbook of parametric and nonparametric statistical procedures,

    Chapman & Hall/CRC, 2000.

    [31]M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, and I. H. Witten, The

    WEKA Data Mining Software: An Update, SIGKDD Explorations, vol. 11, no. 1, 2009.

    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING

    This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

  • 8/6/2019 Cluster Oriented Ensemble Classifier Impact of Multi-Cluster

    36/36

    Author Biography

    Brijesh Verma is a Chair Professor in the School of Information and Communication

    Technology at Central Queensland University, Australia. His research interests include

    pattern recognition and computational intelligence. He has published thirteen books, seven

    book chapters and over hundred papers in journals and conference proceedings. He has

    received twelve competitive research grants and supervised thirty one research students in the

    areas of pattern recognition and computational intelligence. He has served on the program

    committees of over thirty international conferences and editorial boards of six international

    journals. He is a Senior Member of IEEE and has served as a Chair of IEEE Computational

    Intelligence Societys Queensland Chapter (2007-2008) and a member of IEEE CIS

    Subcommittee (2010) for Outstanding Chapter Award.

    Ashfaqur Rahman received his Ph.D. degree in Information Technology from Monash

    University, Australia in 2008. Currently, he is a Research Fellow at the Centre for Intelligent

    and Networked Systems (CINS) at Central Queensland University (CQU), Australia. His

    major research interests are in the fields of data mining, multimedia signal processing and

    communication and artificial intelligence. He has published more than 20 peer-reviewed

    journal articles and conference papers. Dr. Rahman is the recipient of numerous academic

    awards including CQU Seed Grant, the International Postgraduate Research Scholarship

    (IPRS), Monash Graduate Scholarship (MGS) and FIT Dean Scholarship by Monash

    University, Australia.

    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING

    This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.