10
Ensemble Learning and Pruning in Multi-Objective Genetic Programming for Classification with Unbalanced Data Urvesh Bhowan, Mark Johnston, Mengjie Zhang Evolutionary Computation Research Group, Victoria University of Wellington, New Zealand. Abstract. Machine learning algorithms can suffer a performance bias when data sets are unbalanced. This paper develops a multi-objective ge- netic programming approach to evolving accurate and diverse ensembles of non-dominated solutions where members vote on class membership. We explore why the ensembles can also be vulnerable to the learning bias using a range of unbalanced data sets. Based on the notion that smaller ensembles can be better than larger ensembles, we develop a new evolutionary-based pruning method to find groups of highly-cooperative individuals that can improve accuracy on the important minority class. 1 Introduction Classification with unbalanced data is an important problem in machine learning (ML) [1][2][3][4]. Data sets are unbalanced when the learning examples from one class are rare (the minority class), while the larger class makes up the rest (the majority class). Genetic Programming (GP) is an evolutionary ML technique based on the principles of Darwinian evolution and natural selection [5], which has been successful in building reliable and accurate classifiers to solve a range of classification problems [3][6][7]. However, GP, like other ML techniques, can evolve “biased” classifiers when data is unbalanced, i.e., classifiers with strong majority class accuracy but poor minority class accuracy. As the minority class usually represents the main class in many real-world problems, building classifiers with good accuracy on both classes is an important area of research [1][2][4][6]. The learning bias can occur because typical training criteria can be influenced by the larger majority class [1]. Addressing this issue either involves sampling the data set to artificially re-balance the class distributions during the learning process [2][4], or adapting the training criteria for class-specific cost adjustment, e.g., using a weighted average of the minority and majority class accuracies in the cost function [6]. This paper focuses on cost adjustment techniques within the learning algorithm. However, as the minority and majority class accuracies are usually in conflict, selecting suitable costs for the two classes a priori can be problem-specific and require a lengthy trial and error process. Evolutionary multi-objective optimisation (EMO) is a useful alternative where a Pareto fron- tier of the best trade-off solutions can be found in a single optimisation run [8][9].

Ensemble Learning and Pruning in Multi-Objective Genetic Programming for Classification with Unbalanced Data

Embed Size (px)

Citation preview

Ensemble Learning and Pruning inMulti-Objective Genetic Programming for

Classification with Unbalanced Data

Urvesh Bhowan, Mark Johnston, Mengjie Zhang

Evolutionary Computation Research Group,Victoria University of Wellington, New Zealand.

Abstract. Machine learning algorithms can suffer a performance biaswhen data sets are unbalanced. This paper develops a multi-objective ge-netic programming approach to evolving accurate and diverse ensemblesof non-dominated solutions where members vote on class membership.We explore why the ensembles can also be vulnerable to the learningbias using a range of unbalanced data sets. Based on the notion thatsmaller ensembles can be better than larger ensembles, we develop a newevolutionary-based pruning method to find groups of highly-cooperativeindividuals that can improve accuracy on the important minority class.

1 Introduction

Classification with unbalanced data is an important problem in machine learning(ML) [1][2][3][4]. Data sets are unbalanced when the learning examples from oneclass are rare (the minority class), while the larger class makes up the rest (themajority class). Genetic Programming (GP) is an evolutionary ML techniquebased on the principles of Darwinian evolution and natural selection [5], whichhas been successful in building reliable and accurate classifiers to solve a rangeof classification problems [3][6][7]. However, GP, like other ML techniques, canevolve “biased” classifiers when data is unbalanced, i.e., classifiers with strongmajority class accuracy but poor minority class accuracy. As the minority classusually represents the main class in many real-world problems, building classifierswith good accuracy on both classes is an important area of research [1][2][4][6].

The learning bias can occur because typical training criteria can be influencedby the larger majority class [1]. Addressing this issue either involves samplingthe data set to artificially re-balance the class distributions during the learningprocess [2][4], or adapting the training criteria for class-specific cost adjustment,e.g., using a weighted average of the minority and majority class accuracies inthe cost function [6]. This paper focuses on cost adjustment techniques withinthe learning algorithm. However, as the minority and majority class accuraciesare usually in conflict, selecting suitable costs for the two classes a priori canbe problem-specific and require a lengthy trial and error process. Evolutionarymulti-objective optimisation (EMO) is a useful alternative where a Pareto fron-tier of the best trade-off solutions can be found in a single optimisation run [8][9].

Another advantage to EMO is that the combined knowledge of evolved solutionsalong the Pareto frontier can then be utilised cooperatively in an ensemble ofclassifiers to further improve generalisation ability [9][10]. An ensemble can bemore accurate than any of its individual members if the members are accurateand diverse, i.e., make different errors in different inputs [10].

This paper develops a multi-objective GP (MOGP) approach using boththe accuracy and diversity of solutions along the two classes as the learningobjectives. This MOGP approach uses Pairwise Failure Crediting (PFC) [10]for diversity to negatively correlate the predictions of the frontier solutions.Our first research goal evaluates the effectiveness of the ensemble when the fullPareto-front of evolved classifiers works together to predict unseen instancesfor five real-world (binary) class imbalance tasks. We show that the learnedensembles can be vulnerable to the learning bias due to the influence of biasedPareto-front classifiers. To address this, our second research objective develops anew ensemble-pruning method using a second evolutionary search to find smallsubsets of highly-cooperative individuals. This approach is shown to improveensemble performances on the important minority class. We also compare ourMOGP results to another popular ensemble learning approach, namely, NaiveBayes with bagging and balanced bootstrap sampling.

2 Related Work: Ensemble Learning for Class ImbalanceEnsemble learning for class imbalance is typically used in conjunction with sam-pling to either create balanced bootstrap samples in bagging approaches [2][3]or re-balance the training data in EMO-based approaches using diversity mea-sures in fitness [4]. However, sampling can incur a computational overhead, par-ticularly in large data sets with high levels of imbalance, and some samplingtechniques (such as under-samping) can potentially exclude useful training ex-amples from the learning process. We use the multi-objective component for costadjustment and the original unbalanced training data “as is” during learning.

Recent EMO-based approaches to evolving ensembles use Negative Correla-tion Learning (NCL) for ensemble diversity [4][7]. In [4], NCL is only appliedto minority class instances (majority class instances are ignored) to evolve di-verse neural network ensembles; while in [7], NCL serves as the secondary fitnessmeasure in an MOGP approach where the Pareto front is determined using onlythe accuracy of the GP classifiers. This paper is different as a population-baseddiversity measure is used in fitness (PFC) which allows for equal selection prefer-ence between accurate and diverse solutions, potentially creating better diversityin the population. The PFC measure is also applied to both the minority andmajority class separately where each contributes equally in fitness to ensure theensemble members are equally diverse on both classes. Although we use PFCfor ensemble diversity, sampling (such as over or under-sampling) may also beincorporated into the learning approach for ensemble diversity (such as [3]).

In [2] and [11], different ensemble pruning techniques are explored. In [2], agenetic algorithm evolves a set of weights to specify the contribution of individ-uals in the ensemble, using a separate validation set to learn these weights (inaddition to the training set to generate the ensemble). In [11], an expectation

propagation algorithm using Bayesian inference models is used to concurrentlylearn the optimal set of weights while also training the ensemble members. Al-though both of these are effective, a limitation of weight-based ensemble pruningis that suitable weights must be configured for all ensemble members. In contrast,this paper develops a GP-based pruning method that quickly explores differentcombinations of small subsets of individual only, using the original training set.

3 Multi-objective GP (MOGP) for Evolving Ensembles

This paper develops a multi-objective GP approach to simultaneously evolvinga Pareto frontier of GP classifiers along the objectives (minority and majorityclass accuracy) in a single optimisation run. An advantage of evolving a front ofthe best trade-off solutions is that the combined knowledge of these classifierson the objectives can then be shared and used co-operatively in an ensemble. Inan ensemble of classifiers, each member votes on the class label to assign to agiven data instance, where the class label with the most votes determines thefinal ensemble prediction. Ensembles can have good generalisation ability andperform better than all of its individual members provided that the individualsare both accurate and diverse, i.e., generate different errors on different inputs[9][10]. However, if the individual members are not sufficiently accurate anddiverse then the ensemble risks misclassifying all the same inputs together. Forthis reason, an explicit diversity measure in fitness is used to improve diversitybetween solutions so that if one individual generates an error for a given input,the other members do not also make the same error.

3.1 GP Framework for Classification

A tree-based structure is used to represent the genetic program solutions [5].We use feature terminals (example features) and constant terminals (randomlygenerated floating point numbers), and a function set comprising of the fourstandard arithmetic operators, +,−,%, and ×, and the conditional operator if.The +,− and × operators have their usual meanings (addition, subtraction andmultiplication) while % is protected division (usual division except that a divideby zero returns zero). The conditional if function takes three arguments andreturns either the second argument if the first is negative, or the third argumentotherwise. Each GP solution represents a mathematical expression that outputsa (floating-point) number for a given input (data example to be classified). Thisnumber is mapped to the class labels using zero as the threshold, i.e., minorityclass if the classifier output is zero or positive, or majority class otherwise.

3.2 MOGP Fitness

The objective performances of an evolved solution reflects both the accuracyand diversity of the solution on each of the two classes, minority and majorityclass. This is expressed by Eq. (1) for solution Si on class c, where Errc,i isthe total number of incorrect predictions in class c (by solution i) and Nc is thenumber of training examples in class c. An incorrect prediction occurs when thepredicted and actual class labels differ for a given input. Weighting coefficientW specifies the trade-off between accuracy and diversity, where W = 0.5 is

used to treat these two measures as equally important in the evolution. Thisgives equal selection preference to accurate and diverse solutions. The diversityestimate for solution i is represented by PFCc,i for all examples in class c,calculated using Pairwise Failure Crediting (PFC) [10]. PFC represents a penaltyfunction in fitness to reduce the overlap of common errors between solutions inthe population. In the PFC measure, T is population size and the indicatorfunction I(·) returns 1 if the class labels returned from two solutions, i and j,are the same for the given input instance p, or 0 otherwise. In Eq. (1), boththe objective performance Si and PFC measure return values between 0 and 1;for PFC, the higher the value the better the diversity, and likewise for Si wherehigher objective performances imply better accuracy and diversity.

(Si)c = W(

1−Errc,iNc

)+ (1−W )PFCc,i

PFCc,i =1

T−1

∑Tj=1,j ̸=i

∑Ncp=1 I(i,j,p)

Errc,i+Errc,j

(1)

Ranking the Objectives. Pareto dominance in fitness ranks the solutions inthe population according to objective performances. This ranking is importantas it affects the way selection is performed if the different objectives are to betreated separately in the evolution. Pareto dominance between two solutions,expressed by Eq. (2), states that a solution will dominate another solution if itis at least as good as the other solution on all the objectives and better on atleast one. Solutions are non-dominated if they are not dominated by any othersolution in the population.

Si ≻ Sj ←→ ∀c[(Si)c ≥ (Sj)c] ∧ ∃k[(Si)k > (Sj)k] (2)

Our MOGP approach uses the popular and effective Pareto dominance-basedEMO algorithm SPEA2 [8]. This algorithm is shown to evolve an accurate frontsof classifiers along the minority and majority class trade-off surface in theseclass imbalance tasks [7]. In SPEA2, each solution in the population is firstassigned a strength value D based on the number of other solutions it dominatesin the population. The final SPEA2 fitness value, Eq. (3) for solution Si, is thesum of the strength values of all Si’s dominators, i.e., all other solutions in thepopulation that dominate Si. The lower the fitness value returned by Eq. 3,the better the solution on the objectives where non-dominated solutions in thepopulation have the best fitness value of 0 (these solutions have no dominators).

fitness(Si) =∑

j∈Pop,Si≻SjD(Sj)

D(Si) = |{j|j ∈ Pop ∧ Si ≻ Sj}|(3)

3.3 MOGP Search

In SPEA2, the parent and offspring populations are merged together at everygeneration [8]. This combined parent-child population is sorted by fitness val-ues where the fittest individuals are copied into a new population, called thearchive population. The archive serves as the parent population in the next gen-eration, and preserves elitism in the population over generations. The offspring

Table 1. Unbalanced classification tasks used in the experiments.

Name Classes Examples Imb. Features(Minority/Majority) Total Minority Ratio No. Type

Ion Good/bad (ionosphere radar signal) 351 126 (35.8%) 1:3 34 RealSpt Abnormal/normal (tomography scan) 267 55 (20.6%) 1:4 22 BinaryYst1 mit/non-target (protein sequence) 1482 244 (16.5%) 1:6 8 RealYst2 me3/non-target (protein sequence) 1482 163 (10.9%) 1:9 8 RealBal Balanced/unbalanced (balance scale) 625 49 (7.8%) 1:12 4 Integer

population at every generation is generated using the traditional crossover andmutation genetic operators using binary tournament selection. At the end of theevolutionary cycle, the set of non-dominated solutions in the population repre-sents the evolved Pareto-approximated front of classifiers. A majority vote ofthe class labels returned from the evolved set of non-dominated solutions (for agiven input instance) determines the final ensemble output.

4 MOGP Ensemble Performance

In this section we outline the evolutionary parameters and unbalanced data setsused in the experiments, and evaluate the MOGP ensemble performances.

4.1 Evolutionary Parameters and Unbalanced Data Sets

The population size was 500, crossover and mutation rates were 60% and 40%,respectively, and the maximum program depth was 8 to restrict very large pro-grams in the population. The evolution ran for 50 generations. Five benchmarkbinary classification problems taken from the UCI Repository of Machine Learn-ing Databases [12], summarised in Table 1, are used in the experiments. Thesereflect classification tasks with varying levels of complexity and class imbalance.Half of the examples in each class are randomly chosen for the training andthe test sets, to ensure that both sets preserve the same class imbalance ratio.While it is possible that the class distributions in the training set and test setare different, we only consider tasks with similar distributions in both sets.

4.2 MOGP Ensemble Results

Table 2 shows the average minority and majority class accuracies (± standarddeviation) of the evolved ensembles, and the average ensembles sizes, on the testsets over 50 runs. Also included are the ensemble results for (a single run of)Naive Bayes (NB) using bagging with 25 balanced bootstrap samples [13]. The“full-front” ensemble results use all evolved non-dominated solutions (from aMOGP run) in the voting process; these results show that majority class accu-racy is always higher than minority class accuracy in all tasks. The correspondingminority class accuracies are still reasonably good in some tasks (Ion and Yst2),while in the others (Spt and Bal) these are poor. This shows that the evolvedfronts can contain more solutions biased toward the majority class than the op-posite case (solutions with good minority accuracy or middle-region solutions),as these solutions influence the ensemble vote in most tasks.

Table 2. Average MOGP ensemble performances and sizes over 50 runs, and NaiveBayes (NB) ensemble using bagging (with balanced bootstrap sampling).

MOGP Full-Front Ensemble MOGP Pruned Ensemble NB (Bagging)Size Minority Majority Size Minority Majority Size Minority Majority

Ion 28.1 84.9 ± 5.1 92.4 ± 6.4 22.3 81.7 ± 5.8 95.8 ± 3.8 25 88.9 62.5Spt 27.3 44.6 ± 5.4 90.8 ± 2.3 12.1 62.1 ± 8.0 80.5 ± 4.8 25 70.4 77.4Yst1 39.7 64.6 ± 4.8 82.5 ± 4.3 16.5 71.0 ± 4.4 75.5 ± 5.4 25 73.8 78.7Yst2 27.9 81.2 ± 4.9 95.5 ± 1.5 20.6 89.2 ± 3.2 92.3 ± 1.8 25 87.7 92.6Bal 20.8 51.7 ± 18.2 95.4 ± 3.5 10.1 83.6 ± 9.4 79.5 ± 10.3 25 29.2 50.7

0 20 400.7

0.8

0.9

1

Generation

Acc

urac

y

ionosphere

MinorityMajority

0 20 40

0.7

0.8

0.9

1

Generation

Acc

urac

y

me3

MinorityMajority

0 20 400.4

0.6

0.8

1

Generation

Acc

urac

y

spect

MinorityMajority

(a) Ion (similar in Bal) (b) Yst2 (c) Spt (similar in Yst1)

Fig. 1. MOGP ensemble performances using full Pareto front over generations.

Analysis of the ensemble performances during the evolution reveals that thismay be due to genetic drift in the population, toward non-dominated solu-tions biased toward the majority class objective. As the evolution advances overgenerations, more solutions with strong majority class accuracies achieve non-dominated status than solutions with good minority accuracies or middle-regionsolutions. This effect can be seen in Figure 1 to varying degrees. These figuresshow the average minority and majority class performance of the ensemble for50 generations (over 50 runs on the test sets for three tasks). Figure 1 clearlyshows that more solutions with stronger majority class accuracy (than solutionswith stronger minority accuracy) are included in the ensemble over generations,as the ensemble accuracy simply reflects which class receives the most votes fromthe different members. In the remaining tasks (omitted for space constraints),Bal shows similar behaviour to Ion (Figure 1.a) and Yst1 to Spt (Figure 1.c).

Pruning the Ensemble. To address the biased ensemble behaviour, a simpleaccuracy-based selection strategy is used to prune the MOGP ensembles to re-duce the influence of biased non-dominated solutions on the ensemble vote. Thisstrategy only selects non-dominated solutions with at least 50% accuracy onboth objectives for the ensembles. The pruned ensemble performances and sizes,reported in Table 2, show that more balanced class performances are achieved,with noticeably better minority class accuracies, compared to the full-front re-sults in all tasks (except Ion). The trade-off in majority class accuracy is rela-tively small in some tasks (Yst1 and Yst2) compared to others (Spt and Bal).These results show that the full ensembles are vulnerable to the learning bias inthe unbalanced tasks, while the pruned ensembles can be better for the impor-tant minority class. The pruned MOGP ensembles also compare well to NB withbagging, outperforming NB in the two most unbalanced tasks (Yst2 and Bal).

5 Ensemble Pruning

As the pruned ensembles show more balanced class performances and betterminority class accuracies, in this section we investigate the effects of furtherpruning to create smaller ensembles. We develop two pruning methods to in-vestigate whether smaller ensembles can be better for the unbalanced tasks.Although these pruning methods are developed in the context of the MOGPapproach, they are not restricted to MOGP ensembles and can also be used inconjunction with any underlying ensemble learning algorithm.

5.1 Fitness-Based Pruning

In this pruning method, the non-dominated solutions are sorted according totheir raw fitness values on the training objectives (from Eq. 1) and only the best(fittest) N are selected for the ensemble. Configuring N controls the prunedensemble’s size. As there are two objectives (minority and majority class accu-racy), the average of these objective values is used as the final fitness value, toinclude only highly accurate and diverse solutions in the ensemble.

5.2 GP for Evolving Composite Voting Trees

As the fitness-based pruning strategy uses a linear ordering of the fittest Nsolutions for ensemble selection, this method does not guarantee that the overlapof common errors between the fittest N solutions are minimal with respect toeach other only. A more robust ensemble-selection method explores differentcombinations of solution-subsets which are highly-diverse with respect to eachother only. Let X = {p1, p2, ..., pm} be a set of m non-dominated individuals.The function div(Y ) calculates the diversity (i.e. overlap of common errors)between individuals in subset Y ⊆ X. In order to find the solution-subset withthe best diversity we must compare the div(Y ) values for all possible subsets ofX, i.e., {p1, p2}, {p1, p3},{p1, p2, p3} etc. Exploring all possible combinations ofsubsets of X is a computationally expensive and time-consuming combinatorialproblem, particularly for large ensembles and data sets, as each div(Y ) estimateuses at least one pass through the full training set. To address this, we developa GP-based search to efficiently explore this space of possible combinations, toquickly find diverse subsets of non-dominated solutions for the ensemble whichare maximally diverse with respect to each other. The GP approach for ensemblepruning takes as input the evolved set of non-dominated classifiers returnedfrom the MOGP search (called the base classifiers) as shown in Figure 2(a),and evolves a composite voting tree (CVT) representing a small subset of baseclassifiers that are highly-diverse and accurate when combined together in theensemble voting process, as shown in Figure 2(b).

Representation. Tree-based GP is used to represent a CVT solution as shownin Figure 2(b). Each terminal node Pn represents a link to the nth base classifierin the input set (non-dominated MOGP classifiers), similar to feature terminalsin MOGP. The root node of a CVT solution outputs a class label, determined bya majority vote of the predictions of each base classifier (terminal node) in thetree; this is the only component in a CVT tree which computes a value. Recall

0.6 0.8 1

0.6

0.8

1

Minority Acc.M

ajor

ity A

cc.

Pareto Front(base classifiers)

p2

v

v

Output ClassLabel

v

p5 p1 p9 p0

(a) (b)

Fig. 2. (a) Evolved MOGP Pareto front (small circles are classifiers) and (b) evolvedCVT solution (depth 3) which uses a subset of MOGP classifiers.

that the prediction of a MOGP base classifier will be minority class if the baseclassifier’s output is non-negative or majority class otherwise. The new functionv serves no purpose other than to join terminal nodes to the root node or otherv nodes, where v can take any number of arguments between 1 and 3; this allowsdifferent CVT solutions to contain varying numbers of base classifiers.Fitness Function. The output of a CVT solution (when evaluated on a giveninput instance) corresponds to the pruned ensemble output (class label) whosemembers are represented in the CVT solution. The fitness function calculatesthe average classification accuracy of the minority and majority class when eachCVT solution is evaluated on the training set, aimed to evolve CVTs with goodclassification accuracy on both classes.Evolutionary Search. The search process is akin to canonical (single-objective)GP where the fittest CVT solution in the population is returned from the evolu-tion. Crossover, mutation and elitism rates are 60%, 35% and 5%, respectively,and the tournament selection size is 7. The evolution is limited to 50 generationsunless a CVT solution with 100% accuracy on both classes on the training setis evolved, at which point the evolution is stopped. A population size of 1000is used. To focus the evolution toward discovery of small but highly-effectiveCVT solutions, two maximum tree depths are compared, 2 and 3, to restrictthe number of base classifiers in each solution. When tree depth is limited to 2,an evolved CVT solution represents a pruned ensemble of at most 3 members;similarly, a tree depth of 3 represents a pruned ensemble of at most 9 members.

5.3 Performance of Ensembles using Puning Methods

For a fair comparison between these two ensemble-selection methods, we compareensemble performances when a similar number of base classifiers is returned bythe different selection methods, i.e., pruned ensembles limited to (at most) 3and 9 members (odd-numbered ensemble sizes are preferred as no draws canoccur in the voting process). This allows for a comparison of which selectionmethod finds more effective (more accurate) subsets of base classifiers in thepruned ensemble, as well as an investigation of ensemble behaviour when fewerbase classifiers are used in the ensemble voting (compared to the initial ensembleresults from Table 2). To generate pruned ensembles limited to (at most) 3 and 9members, the CVT-based pruning method uses a maximum CVT tree-depth of2 and 3, respectively. Table 3 reports the performances and sizes of the prunedensembles using the two pruning methods, i.e., fitness-based and CVT-based

Table 3. Average MOGP ensemble performances and sizes for two pruning methods(◦ symbol shows the dominating pruning method for similar-sized ensembles).

Fitness-based Pruning CVT-based PruningSize Minority Majority Size Minority Majority

Ion 3 70.5 ± 8.4 91.3 ± 8.9 3.0 77.7 ± 6.1 91.9 ± 4.6 ◦9 75.7 ± 7.2 94.8 ± 4.4 8.9 80.4 ± 5.5 94.3 ± 4.6 ◦

Spt 3 68.1 ± 8.6 70.8 ± 5.4 3.0 72.7 ± 9.6 64.7 ± 10.69 64.2 ± 7.9 77.3 ± 4.7 7.5 58.4 ± 7.8 82.1 ± 4.1

Yst1 3 79.8 ± 15.2 53.8 ± 21.1 3.0 79.1 ± 15.4 59.2 ± 16.2 ◦9 76.3 ± 13.7 61.4 ± 17.0 9.0 77.8 ± 6.1 66.7 ± 6.4 ◦

Yst2 3 95.3 ± 3.9 74.9 ± 6.5 3.0 95.3 ± 2.2 83.2 ± 4.4 ◦9 93.0 ± 2.5 81.3 ± 4.6 9.0 93.5 ± 3.0 86.3 ± 3.9 ◦

Bal 3 78.3 ± 14.4 76.6 ± 13.1 3.0 84.5 ± 10.3 76.3 ± 13.4 ◦9 76.0 ± 14.4 81.6 ± 11.4 7.9 78.8 ± 10.6 85.5 ± 7.9 ◦

pruning, on the test sets over 50 independent runs; these correspond to theinitial 50 MOGP experiments to generate the full ensembles (from Table 2).

Table 3 shows that the CVT-pruned ensembles outperform (i.e. dominate)the fitness-pruned ensembles for both the smallest (at most 3 members) andintermediate-sized (at most 9 members) ensembles in nearly all tasks (exceptSpt). This suggests that the quality of the base classifiers found using the CVTmethod is better than the fitness-based selection method, as these base classifiersimprove the predictive ability of the pruned ensembles due to better cooperationbetween individuals. The evolutionary search to discover good CVTs is reason-ably fast, taking between 0.2 and 5 seconds on the tasks (2–3% of the trainingtime to evolve a MOGP front).

Comparing the pruned ensembles in Table 3 to the ensemble results fromTable 2 shows that in nearly all tasks (except Ion for the MOGP ensemble andYst2 for NB), the smaller the ensemble, the better the minority class accuraciesbut the poorer the majority class accuracies. The smallest MOGP ensemble isdominated by the larger ensembles in only one task, Ion, also the least unbal-anced task. This suggests that in these unbalanced tasks, the pruned ensemblesare better than the larger ensembles but only for the minority class. The poorermajority class accuracies may be due to over-fitting from the secondary trainingphase. However, further investigation is required for future work.

6 Conclusions

The main goal of this paper develops a MOGP approach to classification withunbalanced data to evolve an accurate and diverse ensemble of non-dominatedsolutions along the minority and majority class trade-off frontier. We also com-pare ensemble behaviour using the full non-dominated set of solutions to smallerpruned ensembles, and develop a new pruning method to find small subsets ofhighly-cooperative individuals. Our goals were achieved by examining the clas-sification performance of the full and pruned MOGP-evolved ensembles on fiveunbalanced (binary) tasks.

We show that the full MOGP ensembles is vulnerable to the learning biasdue to the influence of more Pareto front solutions with stronger majority class

accuracies (than solutions with good minority class accuracies). As the ensemblessizes are reduced, the pruned MOGP ensembles show better accuracies on theimportant minority class but not the majority class in the unbalanced tasks. Thenew GP-based ensemble pruning method finds highly-cooperative individualsfor the pruned MOGP ensembles, as these have better accuracy on both classescompared to a fitness-based selection method for pruning on these tasks.

For future work we will investigate these methods on more unbalanced datasets and compare our results to canonical (single-objective) GP with differentfitness functions for classification with unbalanced data. We also will investigatehow the two new pruning techniques treat diversity in the pruned ensembles.

References

1. Weiss, G.M., Provost, F.: Learning when training data are costly: The effect ofclass distribution on tree induction. Journal of Artificial Intelligence Research 19(2003) 315–354

2. Chawla, N., Sylvester, J.: Exploiting diversity in ensembles: improving the perfor-mance on unbalanced datasets. In: Proceedings of the 7th international conferenceon Multiple classifier systems. MCS, Springer-Verlag (2007) 397–406

3. Mclntyre, A., Heywood, M.: Multi-objective competitive coevolution for efficientGP classifier problem decomposition. In: IEEE International Conference on Sys-tems, Man and Cybernetics. (2007) 1930 –1937

4. Wang, S., Tang, K., Yao, X.: Diversity exploration and negative correlation learningon imbalanced data sets. In: International Joint Conference on Neural Networks.(2009) 3259 –3266

5. Koza, J.R.: Genetic Programming: On the Programming of Computers by Meansof Natural Selection. MIT Press (1992)

6. Holmes, J.H.: Differential negative reinforcement improves classifier system learn-ing rate in two-class problems with unequal base rates. In: J.R. Koza, W. Banzhaf,K. Chellapilla, et al (eds.): Genetic Programming 1998: Proceedings of the ThirdAnnual Conference. (1998) 635–644

7. Bhowan, U., Zhang, M., Johnston, M.: Evolving ensembles in multi-objectivegenetic programming for classification with unbalanced data. In: Proceedings of2011 Genetic and Evolutionary Computation Conference, ACM (2011) 1331–1339

8. Zitzler, E., Laumanns, M., Thiele, L.: Spea2: Improving the strength pareto evo-lutionary algorithm for multiobjective optimization. Technical report (2001) TIK-Report 103, Department of Electrical Engineering, Swiss Federal Institute of Tech-nology.

9. Jin, Y., Sendhoff, B.: Pareto-based multiobjective machine learning: An overviewand case studies. IEEE Transactions on Systems, Man, and Cybernetics, Part C:Applications and Reviews 38 (2008) 397–415

10. Chandra, A., Yao, X.: Ensemble learning using multi-objective evolutionary algo-rithms. Journal of Mathematical Modelling and Algorithms 5 (2006) 417–445

11. Chen, H., Tino, P., , Yao, X.: Predictive ensemble pruning by expectation propaga-tion. IEEE Transactions on Knowledge and Data Engineering 21 (2009) 999–1013

12. Asuncion, A., Newman, D.: UCI Machine Learning Repository (2007)“http://www.ics.uci.edu/∼mlearn/MLRepository.html”. University of California,Irvine, School of Information and Computer Sciences.

13. Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: TheWEKA data mining software: An update. SIGKDD Explorations 11 (1) (2009)