27
IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION, VOL. 12,NO. 4, AUGUST2008 479 Measuring Generalization Performance in Coevolutionary Learning Siang Yew Chong, Member, IEEE, Peter Tiño, and Xin Yao, Fellow, IEEE Abstract—Coevolutionary learning involves a training process where training samples are instances of solutions that interact strategically to guide the evolutionary (learning) process. One main research issue is with the generalization performance, i.e., the search for solutions (e.g., input–output mappings) that best predict the required output for any new input that has not been seen during the evolutionary process. However, there is currently no such framework for determining the generalization performance in coevolutionary learning even though the notion of generalization is well-understood in machine learning. In this paper, we introduce a theoretical framework to address this re- search issue. We present the framework in terms of game-playing although our results are more general. Here, a strategy’s gener- alization performance is its average performance against all test strategies. Given that the true value may not be determined by solving analytically a closed-form formula and is computationally prohibitive, we propose an estimation procedure that computes the average performance against a small sample of random test strategies instead. We perform a mathematical analysis to provide a statistical claim on the accuracy of our estimation procedure, which can be further improved by performing a second estimation on the variance of the random variable. For game-playing, it is well-known that one is more interested in the generalization performance against a biased and diverse sample of “good” test strategies. We introduce a simple approach to obtain such a test sample through the multiple partial enumerative search of the strategy space that does not require human expertise and is generally applicable to a wide range of domains. We illustrate the generalization framework on the coevolutionary learning of the iterated prisoner’s dilemma (IPD) games. We investigate two definitions of generalization performance for the IPD game based on different performance criteria, e.g., in terms of the number of wins based on individual outcomes and in terms of average payoff. We show that a small sample of test strategies can be used to estimate the generalization performance. We also show that the generalization performance using a biased and diverse set of “good” test strategies is lower compared to the unbiased case for the IPD game. This is the first time that generalization is defined and analyzed rigorously in coevolutionary learning. The frame- work allows the evaluation of the generalization performance of any coevolutionary learning system quantitatively. Index Terms—Chebyshev’s inequality, coevolutionary learning, evolutionary computation, generalization, iterated prisoner’s dilemma (IPD). Manuscript received August 7, 2006; revised May 31, 2007. First published February 2, 2008; last published July 30, 2008 (projected). The work of X. Yao was supported in part by the Engineering and Physical Sciences Research Council (EPSRC) under Grant GR/T10671/01. S. Y. Chong and X. Yao are with the Centre of Excellence for Research in Computational Intelligence and Applications (CERCIA), School of Computer Science, University of Birmingham, Edgbaston, Birmingham B15 2TT, U.K. (e-mail: [email protected]; [email protected]). P. Tiño is with the School of Computer Science, University of Birmingham, Edgbaston, Birmingham B15 2TT, U.K. (e-mail: [email protected]). Digital Object Identifier 10.1109/TEVC.2007.907593 I. INTRODUCTION C OEVOLUTIONARY learning refers to a broad class of population-based, stochastic search algorithms that involve the simultaneous evolution of competing solutions with coupled fitness [1]. A coevolutionary learning system can be implemented using coevolutionary algorithms [2], [3], which can be derived from evolutionary algorithms (EAs) [4], [5]. That is, both coevolutionary learning and EAs can be described in terms of the framework, whereby an adaptation process is carried out on the solutions in some form of representation through a repeated process of variation and selection. The framework distinguishes coevolutionary learning and EAs, in general, from classical approaches (e.g., steepest-descent-based algorithms) from two specific features, i.e., they are popula- tion-based and incorporate information exchange mechanisms between populations in successive generations (iterative steps) to guide the search process [6]. Despite their similarity in framework, coevolutionary learning and EAs are fundamentally different in how the fitness of a solution is assigned, leading to significantly different out- comes when applied to similar problems (e.g., different search behaviors on the space of solutions [7], [8]). EAs are often viewed and constructed in terms of an optimization context [4], [5], whereby an absolute fitness function is required to assign the fitness value to a solution that is objective (fitness value for a solution is always the same regardless of the context). For coevolutionary learning, the fitness of a solution is obtained through its interactions with other competing solutions in the current population, and as such, is subjective (fitness value for a solution depends on the context, e.g., population, and as such is relative and dynamic). Here, coevolutionary learning operates to find solutions guided by strategic interactions among the competing solutions from one generation to the next that results (hopefully) in an arms race of increasingly innovative solutions [9]–[11]. One of the early motivations for using coevolutionary learning is its potential application for solving problems that cannot be framed in the context of optimization (e.g., using EAs) because it is not possible or very difficult to formulate an absolute fitness function that reflects the underlying prop- erties of the problem. For such problems, continued use of an inappropriate fitness function will often bias the search to solutions that do not reflect the underlying properties of the problem, leading to suboptimal solutions [12]. Even if a fitness function can be formulated, it may not be able to evaluate and differentiate between individual solutions to provide some gradient to direct the search when using EAs [8], [13]. One such problem that is difficult to solve using EAs, but can be naturally framed in coevolutionary learning, is the problem of game-playing [2], [3], [14]–[17]. 1089-778X/$25.00 © 2008 IEEE Authorized licensed use limited to: UNIVERSITY OF BIRMINGHAM. Downloaded on June 11, 2009 at 11:53 from IEEE Xplore. Restrictions apply.

Measuring Generalization Performance in Coevolutionary Learning

Embed Size (px)

Citation preview

IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION, VOL. 12, NO. 4, AUGUST 2008 479

Measuring Generalization Performance inCoevolutionary Learning

Siang Yew Chong, Member, IEEE, Peter Tiño, and Xin Yao, Fellow, IEEE

Abstract—Coevolutionary learning involves a training processwhere training samples are instances of solutions that interactstrategically to guide the evolutionary (learning) process. Onemain research issue is with the generalization performance,i.e., the search for solutions (e.g., input–output mappings) thatbest predict the required output for any new input that has notbeen seen during the evolutionary process. However, there iscurrently no such framework for determining the generalizationperformance in coevolutionary learning even though the notionof generalization is well-understood in machine learning. In thispaper, we introduce a theoretical framework to address this re-search issue. We present the framework in terms of game-playingalthough our results are more general. Here, a strategy’s gener-alization performance is its average performance against all teststrategies. Given that the true value may not be determined bysolving analytically a closed-form formula and is computationallyprohibitive, we propose an estimation procedure that computesthe average performance against a small sample of random teststrategies instead. We perform a mathematical analysis to providea statistical claim on the accuracy of our estimation procedure,which can be further improved by performing a second estimationon the variance of the random variable. For game-playing, itis well-known that one is more interested in the generalizationperformance against a biased and diverse sample of “good” teststrategies. We introduce a simple approach to obtain such atest sample through the multiple partial enumerative search ofthe strategy space that does not require human expertise and isgenerally applicable to a wide range of domains. We illustratethe generalization framework on the coevolutionary learning ofthe iterated prisoner’s dilemma (IPD) games. We investigate twodefinitions of generalization performance for the IPD game basedon different performance criteria, e.g., in terms of the numberof wins based on individual outcomes and in terms of averagepayoff. We show that a small sample of test strategies can be usedto estimate the generalization performance. We also show thatthe generalization performance using a biased and diverse set of“good” test strategies is lower compared to the unbiased case forthe IPD game. This is the first time that generalization is definedand analyzed rigorously in coevolutionary learning. The frame-work allows the evaluation of the generalization performance ofany coevolutionary learning system quantitatively.

Index Terms—Chebyshev’s inequality, coevolutionary learning,evolutionary computation, generalization, iterated prisoner’sdilemma (IPD).

Manuscript received August 7, 2006; revised May 31, 2007. First publishedFebruary 2, 2008; last published July 30, 2008 (projected). The work of X. Yaowas supported in part by the Engineering and Physical Sciences ResearchCouncil (EPSRC) under Grant GR/T10671/01.

S. Y. Chong and X. Yao are with the Centre of Excellence for Research inComputational Intelligence and Applications (CERCIA), School of ComputerScience, University of Birmingham, Edgbaston, Birmingham B15 2TT, U.K.(e-mail: [email protected]; [email protected]).

P. Tiño is with the School of Computer Science, University of Birmingham,Edgbaston, Birmingham B15 2TT, U.K. (e-mail: [email protected]).

Digital Object Identifier 10.1109/TEVC.2007.907593

I. INTRODUCTION

C OEVOLUTIONARY learning refers to a broad classof population-based, stochastic search algorithms that

involve the simultaneous evolution of competing solutions withcoupled fitness [1]. A coevolutionary learning system can beimplemented using coevolutionary algorithms [2], [3], whichcan be derived from evolutionary algorithms (EAs) [4], [5].That is, both coevolutionary learning and EAs can be describedin terms of the framework, whereby an adaptation process iscarried out on the solutions in some form of representationthrough a repeated process of variation and selection. Theframework distinguishes coevolutionary learning and EAs, ingeneral, from classical approaches (e.g., steepest-descent-basedalgorithms) from two specific features, i.e., they are popula-tion-based and incorporate information exchange mechanismsbetween populations in successive generations (iterative steps)to guide the search process [6].

Despite their similarity in framework, coevolutionarylearning and EAs are fundamentally different in how the fitnessof a solution is assigned, leading to significantly different out-comes when applied to similar problems (e.g., different searchbehaviors on the space of solutions [7], [8]). EAs are oftenviewed and constructed in terms of an optimization context [4],[5], whereby an absolute fitness function is required to assignthe fitness value to a solution that is objective (fitness value fora solution is always the same regardless of the context). Forcoevolutionary learning, the fitness of a solution is obtainedthrough its interactions with other competing solutions in thecurrent population, and as such, is subjective (fitness value for asolution depends on the context, e.g., population, and as such isrelative and dynamic). Here, coevolutionary learning operatesto find solutions guided by strategic interactions among thecompeting solutions from one generation to the next that results(hopefully) in an arms race of increasingly innovative solutions[9]–[11].

One of the early motivations for using coevolutionarylearning is its potential application for solving problems thatcannot be framed in the context of optimization (e.g., usingEAs) because it is not possible or very difficult to formulatean absolute fitness function that reflects the underlying prop-erties of the problem. For such problems, continued use ofan inappropriate fitness function will often bias the search tosolutions that do not reflect the underlying properties of theproblem, leading to suboptimal solutions [12]. Even if a fitnessfunction can be formulated, it may not be able to evaluateand differentiate between individual solutions to provide somegradient to direct the search when using EAs [8], [13]. Onesuch problem that is difficult to solve using EAs, but can benaturally framed in coevolutionary learning, is the problem ofgame-playing [2], [3], [14]–[17].

1089-778X/$25.00 © 2008 IEEE

Authorized licensed use limited to: UNIVERSITY OF BIRMINGHAM. Downloaded on June 11, 2009 at 11:53 from IEEE Xplore. Restrictions apply.

480 IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION, VOL. 12, NO. 4, AUGUST 2008

However, despite the success of coevolutionary learning insolving the problem of games [2], [3], [14]–[17] (and otherproblems in the context of optimization [18] and classification[19], [20]), the approach is not trouble-free for all problems. Inparticular, coevolutionary learning is now recognized to sufferfrom problems (collectively called coevolutionary pathologies)that affect the performance of a coevolutionary learning system[11]–[13], [21]–[23]. For example, overspecialization to asingle solution can occur in the population [21], which can be aresult of earlier population disengagement, e.g., some solutionsare favored over others due to large gaps in competence level[13]. When intransitivity exists in the relationship betweensolutions, cyclic dynamics may occur during coevolution,whereby at some point in the process, the population overspe-cializes to a solution that is vulnerable to another solution thatexploits it [11], [12]. Furthermore, when a solution is drivento extinction but at a later point is adaptively found again, thecoevolution is said to exhibit forgetting [22], [23].

Coevolutionary pathologies are usually attributed to the useof relative fitness in the selection process of a coevolutionarylearning system [8], [24]. There are also broader implicationsin the design of the coevolutionary learning system to its perfor-mance that is dependent on the coevolutionary search dynamics(i.e., representation [25], variation [21], [25], [26], and selection[27]). This necessitates a more in-depth study of coevolutionarylearning search dynamics. In particular, one approach that hasbeen given much attention in the past is the monitoring of theprogress of the coevolution for search of more innovative andsophisticated solutions, e.g., arms race dynamics [28]–[31].

Another approach is to consider a global view of increasingperformance in coevolutionary learning. However, for this par-ticular investigation, previous studies are restricted to simpleproblems or problems where the global view is known in ad-vance [8], [11]. Tools introduced for monitoring progress ofarms race dynamics are inappropriate or may not be suitablyadapted for the global view analysis because they only providerelative performance information of solutions between differentgenerations. This is because solutions in the current generationthat are better than those in previous generations do not neces-sarily imply that they perform better globally when comparedwith new or all possible solutions.

In machine learning, there exists a powerful framework,generalization, that provides a global view of performancefor learning systems. Generalization refers to the ability ofthe learning system to find the solution, which can be viewedin the context of input–output mappings, that best predictsthe required output for any new input that has not been seenduring the training process. With the context of generalization,one is interested in how the learning system can realize theunderlying properties of the problem from a small sample oftraining data (e.g., the input–output set) to produce the solu-tion. For example, in the case of neural network training, oneis not interested in learning the “exact representation of thetraining data itself, but rather to build a statistical model of theprocess which generates the data” [32, p. 332]. Furthermore,it should be noted that learning is not necessarily the sameas optimizing because the solution with optimal fitness doesnot necessarily imply it has the best generalization (unless anaccurate metric to measure generalization is used in the fitnessfunction) [33].

Coevolutionary learning, like other learning systems, alsouses a small training sample during the evolutionary (training)process to produce a solution to the problem. However, coevo-lutionary learning is different from other learning systems inthat the training samples are not fixed, but instead, are instancesof solutions that are changing (evolving) and are interactingwith each other strategically to guide the evolutionary process(i.e., learning). Despite this difference, the generalizationframework can be applied to coevolutionary learning systemsto provide a global view in performance. Although the notion ofgeneralization is well-understood in machine learning, there isa lack of theoretical framework to justify how the generalizationperformance of a coevolutionary learning system is determinedwith respect to the problem. For example, past studies such as[33] and [34] have used a large sample of randomly obtainedtest cases to estimate the generalization performance. It is notknown how accurate such an estimation is, i.e., how closethe estimated generalization performance (using test samplesrandomly obtained from the search space) to the true general-ization performance (using the entire search space).

In this paper, we introduce a theoretical framework thataddresses the problem of determining the generalization per-formance of coevolutionary learning in general. We present theframework in terms of game-playing, i.e., learning game strate-gies that generalize well to the game (e.g., defeat a large numberof strategies that exist). However, our theoretical framework isgeneral, i.e., the problem can be put in the context of test-basedevaluations (e.g., comparisons between solutions), where sometests can reflect the underlying properties (objectives) of theproblem that are unknown (game-playing is one such problem)[12]. We first define generalization performance of a strategyas its average performance against all test strategies. With thisdefinition, it follows that the best generalization performancefor a coevolutionary learning system is the one that producesevolved strategies with the maximum average performanceagainst all strategies.

Although this definition is simple, the generalization perfor-mance can be difficult to determine for two reasons. First, theanalytical function for game outcomes can be unknown, and assuch, we cannot determine the generalization performance bysolving analytically a closed-form formula. Second, the strategyspace can be very large (although finite), thus making it com-putationally prohibitive. To address this problem, we proposethe alternative of estimating the generalization performance bytaking the average performance of the evolved strategy againsta sample of test strategies that are randomly drawn from thestrategy space.

We show through a mathematical analysis using Chebyshev’sTheorem that the probability that the absolute difference be-tween the estimated and true values exceeding a given error (pre-cision value) is bounded by a value that reciprocally dependsonly on the square of the error and the size of the random testsample. However, this probability bound assumes the worst caseof having maximum variance for the distribution of the randomvariable over a bounded interval. In general, the true variance issmaller than the maximum value. As such, we perform a mathe-matical analysis and show how a second estimation of the vari-ance can be used to obtain a tighter bound. In addition, we alsoshow that for some games, the true variance of a strategy per-formance with respect to the strategy space is smaller, and as

Authorized licensed use limited to: UNIVERSITY OF BIRMINGHAM. Downloaded on June 11, 2009 at 11:53 from IEEE Xplore. Restrictions apply.

CHONG et al.: MEASURING GENERALIZATION PERFORMANCE IN COEVOLUTIONARY LEARNING 481

such, requires a smaller sample size to make the same statisticalclaim.

With this framework, it is now shown that a sample of ran-domly obtained test strategies that is much smaller in size com-pared with the total number of strategies in the space is sufficientto estimate the generalization performance of coevolutionarylearning, and that this estimation is close enough to the truevalue. We apply the framework to the coevolutionary learningof the IPD games to illustrate both the advantage of the frame-work and how it can be used in general. In particular, we investi-gate two different definitions of generalization performance forthe IPD game based on different performance criteria, e.g., interms of the number of wins based on individual outcomes andin terms of average payoff.

It is well-known that rather than considering the average per-formance for all cases, one may be more interested in the av-erage performance for specific cases that are more commonor that arise naturally. In the context of game-playing, one ismore interested in the generalization performance of the coevo-lutionary learning system against “good” strategies, and not theaverage performance against all strategies since it is possiblethat a large proportion of strategies in the space are poor ormediocre. To determine generalization performance against thisbiased sample of good test strategies, we introduce a simple ap-proach to obtain such a test sample through the multiple partialenumerative search of the strategy space. Each partial enumer-ative search uses a population size that is much larger than thetotal number of possible unique strategies that can be searchedduring coevolution to produce a best strategy. This approachdoes not require human expertise (as in generating some ar-bitrarily designed strategies) and is more comprehensive thanthe previous single partial enumerative search that we intro-duced earlier in [35]–[37]. We show that for the coevolutionarylearning of IPD games, the generalization performance for thecase of a biased and diverse sample is lower compared with thecase of an unbiased sample.

This paper is organized as follows. Section II presents ourtheoretical framework of generalization performance of coevo-lutionary learning. Section III illustrates how the frameworkcan be applied to the IPD game. Section IV investigates theapplication of the framework to estimate the generalizationperformance of coevolutionary learning of simple IPD gameswhere the true generalization performance can be determined.Section V presents an empirical study on estimating generaliza-tion performance of coevolutionary learning of slightly morecomplex games, where the true generalization performancecannot be determined. The section also compares the resultsof estimates based on using an unbiased test sample and thatof using a biased and diverse sample of “good” test strategies.Section VI discusses the implications of the framework andconcludes the paper.

II. GENERALIZATION FRAMEWORK FOR

COEVOLUTIONARY LEARNING

A. A Need for a Consistent and General Approach to Estimatethe Generalization Performance of Coevolutionary Learning

Darwen and Yao [33]–[35] were among the first to explic-itly investigate coevolutionary learning through the frameworkof generalization from machine learning (others include [38]).

However, they [33]–[35] studied the issue through an empiricalapproach only. In particular, they investigated the utility of usingsome random sample of test cases to estimate the generalizationperformance of a coevolutionary learning system.

There are other studies that can be related to generalizationin the context of coevolutionary learning. Wiegand and Potter[39] studied the notion of robustness (of individual components)in a cooperative coevolution setting. Ficici and Pollack [23]studied the notion of solution concepts, e.g., a partitioning ofsearch space into solutions (that are wanted) and nonsolutionsfor a problem from measuring some properties and establishingsome criteria that the searched point is a solution. Bowling [40]studied the notion of regret, i.e., measures performance differ-ence between a learning algorithm with the best static strategyduring training. Powers and Shoham [41] studied the estimationof best-response through the use of random samples. Studies in[42]–[44] investigated formalisms of monotonic improvementin coevolutionary learning and developed algorithmic frame-works that guarantee monotonic improvements of coevolvingsolutions based on archive of test cases.

Here, our main motivation is to develop a framework for arigorous quantitative analysis of performance in coevolutionarylearning using the notion of generalization from machinelearning. We are motivated to address the need for a princi-pled approach to estimate the generalization performance ofcoevolutionary learning. The framework aims to allow one toestimate the generalization performance, in general, for prob-lems coevolutionary learning is used to solve and at any pointin the evolutionary process where generalization performanceis measured.

There are two reasons why measuring generalization perfor-mance of coevolutionary learning is necessary and important.First, it is used to provide an absolute quality measure on howwell a coevolutionary learning system is performing with re-spect to the problem, i.e., how well the coevolutionary learninggeneralizes. Second, it can be used as a means to compare thegeneralization performance of different coevolutionary learningsystems with respect to the problem.

We first introduce a theoretical framework that defines explic-itly the generalization performance for coevolutionary learning,and how it can be determined, i.e., measured. However, it isnoted that obtaining the true generalization performance fora coevolutionary learning system is very difficult. As such,through the theoretical framework, we provide an alternativeof a consistent and general procedure to estimate the gener-alization performance. We show a mathematical analysis ofhow the generalization performance can be estimated using arandom sample of test cases. We demonstrate the utility of theestimation procedure by determining the statistical claim thatone can make of how confident one is with the accuracy of theestimate compared with the true generalization for a randomtest sample of a given size.

B. Estimating Generalization Performance

In coevolutionary learning, the quality of a solution is deter-mined relative to other solutions. This is achieved through com-parisons, i.e., interactions between solutions. These interactionscan be framed in terms of game-playing, i.e., an interaction is a

Authorized licensed use limited to: UNIVERSITY OF BIRMINGHAM. Downloaded on June 11, 2009 at 11:53 from IEEE Xplore. Restrictions apply.

482 IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION, VOL. 12, NO. 4, AUGUST 2008

game played between two strategies (solutions). The game out-come of a strategy against the opponent strategy is ,and conversely, the game outcome of strategy against strategy

is denoted . Strategy is said to solve the test providedby strategy if .1

Here, the absolute quality (generalization performance) of astrategy is measured with respect to its expected performancein solving tests provided by strategies . The goal of coevo-lutionary learning for the problem of game-playing is to learna strategy with the best generalization performance. In thefollowing, we present a theoretical framework for estimatingthe generalization performance of coevolutionary learning. Itshould be noted that the framework is presented in the contextof the complete solution. As such, the framework is directly ap-plicable for the estimation of generalization performance of acomplete solution obtained either by a competitive or a cooper-ative coevolutionary learning system.

1) True Generalization Performance: The generalizationperformance of coevolutionary learning is determined withrespect to the evolved strategy that is produced. The true gen-eralization performance of coevolutionary learning is definedas the expected performance of strategy that is producedafter a learning process (coevolution) against all strategies inthe strategy space . The true generalization performance ofstrategy , can be written as follows:

(1)

where is the expectation of strategy’s performance against, with respect to the distribution over strategy

space (i.e., the distribution with which opponent strategiesare drawn). Note that this definition of generalization does

not imply having to compare with all strategies that exist. In-stead, , which can be specified by the strategy representa-tion (e.g., neural networks), can be induced to the strategy space

. As such, some strategies that exist may not be included todetermine because .

There are two difficulties to apply (1) directly. First, the ana-lytical form for is not known or difficult to obtain (evenfor simple games). Second, the distribution over maybe unknown (at best, we can only sample from from a strategyrepresentation that is used).

However, it is possible for these games to have a finite numberof possible unique strategies, e.g., is discrete and finite, orthat we can consider a subset of that is discrete and finite byinducing some strategy distribution to . For the purposeof presentation and simplicity, we assume a uniform strategydistribution in .

With this, one can compute the true generalization perfor-mance of coevolutionary learning through a strategy , whichis simply its average performance against all strategies , andcan be written in the form

(2)

1For example, in a zero-sum game such as chess, one can say that strategy �solves the test provided by strategy � if � defeats � , i.e., � ��� � � ���.

where is the total number of unique strategies that strategyplays against. For example, consider the two-choice IPD game

with deterministic and reactive memory-one strategies, whereeach strategy consider its previous moves and opponent’s pre-vious move. For such a game, there is only a total of

unique strategies in the strategy space.However, the true generalization performance may not be

computed by applying (2) as the game increases in complexity.For example, considering the IPD game above, if strategies canplay from choices (e.g., -choice IPD game), the total numberof unique strategies increases to . As such, even a mod-erate increase in the number of choices results in a large in-crease in the total number of strategies. With three choices,there are strategies. As the number ofchoices increase to four, the total number of strategies increasesto , and so forth. Thus, com-puting the true generalization performance using (2) quickly be-comes computationally infeasible even for a small increase inthe number of choices.

2) Estimated Generalization Performance: Given that thetrue generalization performance cannot be obtained (e.g., itcannot be determined by solving analytically a closed-formformula and is computationally prohibitive), the next alternativeis to estimate the generalization performance. Here, we pro-pose estimating the generalization performance by consideringthe average performance against a smaller sample of teststrategies that can be computed (i.e., ). The estimatedgeneralization performance of strategy is given by

(3)

where is the sample of test strategies randomly drawnfrom . For the rest of this paper, we use the notation torepresent .

3) Chebyshev’s Bound for Determining the Accuracy of theEstimation of Generalization Performance: We want to knowhow close , obtained using a set of test strategies randomlydrawn from of size , when compared with , which isobtained using all possible strategies from of size (where

). That is, we want to know if the absolute differencebetween and , i.e., , is sufficiently small, i.e.,does not exceed a small positive number (where ). If thisis true, then we say is approximately , i.e., .

However, we do not know what is, because is notknown. One can try to find how likely that is close enoughto . That is, what is the probability that ifone uses randomly obtained test strategies to obtain . Weanswer this question using Chebyshev’s Theorem [45].

Our mathematical analysis does not assume a particularstrategy distribution , which can be induced on by aparticular strategy representation. All that is required is thatis a set of independent identically distributed (i.i.d.) strategiesdrawn from according to , e.g., using the same strategyrepresentation.

We want to estimate the difference, , whichis a random variable because the set of strategies used to

Authorized licensed use limited to: UNIVERSITY OF BIRMINGHAM. Downloaded on June 11, 2009 at 11:53 from IEEE Xplore. Restrictions apply.

CHONG et al.: MEASURING GENERALIZATION PERFORMANCE IN COEVOLUTIONARY LEARNING 483

obtain is sampled randomly from . From the definition of, we apply Chebyshev’s Theorem [45] and obtain2

(4)

for a strategy (for full details, please refer to the Appendixsection). For the rest of this paper, we use the notation torepresent . Furthermore

We can bound . In general, for the random variabledistributed over the interval (where and

is the maximum and minimum of , respectively),the maximum variance is when half of the mass is at

and the other half is at , which results in a meanof and standard deviation of

(see [46]).Let be the range of the random vari-

able . We can thus restate Chebyshev’s bound as Lemma 1[45].

Lemma 1: For a strategy , let be the estimated generaliza-tion performance with respect to random test strategies and

be the true generalization performance. Consider the abso-lute difference , which is a random variable with distri-bution taken on a compact interval of length

. Then, for any positive number

(5)

We note the generality of this framework by observing thefollowing three points.

1) The framework is independent of the complexity of thegame, since it is independent of the size of the strategyspace, and independent of the strategy distribution in thestrategy space.

2) Both and are often known a priori as theyare specified by the game, which means that we can alwaysobtain an upper bound on (5).

3) The framework is independent of learning algorithms sincethe bound holds for any strategy in the strategy space.

4) Application of Chebyshev’s Bound in the GeneralizationFramework: To see how Chebyshev’s bound can be used in theframework for estimating generalization performance, we firstneed to identify the relationship given by (5) between the prob-ability , the sample size , and the precisionvalue . For simplicity, let . This lets us simplify (5) to

which is the same as

2The analysis can be extended to games with probabilistic strategies as wellsince � is a random variable.

Fig. 1. Showing various graphs of the theoretical Chebyshev’s bounds in prob-ability [given by (5)]� for different size� of the sample of random test strate-gies � in the range of precision value � in [0.01, 0.1]. Graphs are labeled as� .

where is the normalizedabsolute difference of estimated and true values of the general-ization performance.

To illustrate the Chebyshev’s bound further, we plot the curveof as points in the - space. Fig. 1 showsthe curves for Chebyshev’s bounds for different s over arange of s in [0.01, 0.1]. Each curve gives the upper boundfor when a random sample of test strategies ofsize is used to estimate the generalization performance.

Note that is inversely proportional to and . Forany , an increasingly larger number for is required to obtainlower Chebyshev’s bounds (Fig. 1). Assuming the worst case,and when all the three values , and are known, onecan use a statistical argument to claim with confidence or proba-bility at least that for a precision value .

In practice, we only need to specify to be as large as pos-sible for which we can compute the estimated generalizationperformance. After that, we have to make a “tradeoff” betweenthe confidence level (i.e., being more confident that the estima-tion is likely to be close to the true result) and precision value(i.e., the value by which the estimation will not be bigger orsmaller compared with the true result).

5) Obtaining a Tighter Chebyshev’s Bound by Estimatingthe Variance : Consider a random variable with the un-derlying distribution taken from a compact interval of realnumbers, i.e., . For realizations ,the empirical mean is given by

The true mean is given by

while the true variance is given by

Authorized licensed use limited to: UNIVERSITY OF BIRMINGHAM. Downloaded on June 11, 2009 at 11:53 from IEEE Xplore. Restrictions apply.

484 IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION, VOL. 12, NO. 4, AUGUST 2008

The maximum variance for a random variable over iswhen half of the mass is at and the other half is at , i.e.,

where . Applying Chebyshev’sTheorem gives us the following:

We want to find out if a tighter upper bound for Chebyshev’sbound (for any random variable distributed over a compact in-terval) can be obtained. If we knew , we couldobtain an unbiased estimate of

where , are realizations of a newrandom variable . is the empiricalmean of based on a sample .

The true mean of is

We can apply Chebyshev’s Theorem for and obtain thefollowing inequality:

where is the variance of .For , given that the minimal possible value

is (if ), and that the maximal possible valueis (if and ), the range for is

. Again, trivially,can be bounded by

So

We can thus summarize the following two points. First, withprobability of at least (with respect togenerating many -tuples of observations ), weknow that is no further away from than , i.e., ).That is

Second, with probability of at least ,we know that is no further away from than , i.e.,

). That is

We would like to use as an upper bound for (whenis lower than the maximum possible value ). However,

we have assumed that we know . In actual situations, we do notknow .

We examine the correction required for the fact that ratherthan is used to calculate the variance . That is

We know that with probability at least. So, the worst case is when

underestimates (we are interested in the upper bound for) because is closer to than . There are two situations

where we get this: (a) when (in which case wecalculate ), and (b) when (inwhich case we calculate ).

As such, with probability (with respect to generation of many-tuples from ) , we

have . So, with probability , wecan be sure that

With probability , we have the upperbound .

In order to combine the probabilities in a simple way, wemust make the events independent. So, we need two sets of

-tuples of observations. The first set ofis used to estimate . The secondset is used to estimate

and .Then, with probability

we have since we know that and.

As such, one can claim with probability at least that thefollowing inequality holds:

However, for this inequality to be of use, we require that.

We apply this result to obtain a tighter upper bound for thegeneralization framework introduced earlier, which we presentas Lemma 2.

Lemma 2: For a strategy , consider two indepen-dent nonoverlapping sets of test strategies: and

, where and . Thefirst set is used to estimate the generalization performance

. The second set is used to esti-mate the variance ,for some positive number , where

. Then, for with probability atleast , the followinginequality holds:

(6)

Authorized licensed use limited to: UNIVERSITY OF BIRMINGHAM. Downloaded on June 11, 2009 at 11:53 from IEEE Xplore. Restrictions apply.

CHONG et al.: MEASURING GENERALIZATION PERFORMANCE IN COEVOLUTIONARY LEARNING 485

Fig. 2. The payoff matrix framework of a two-player, two-choice game. Thepayoff given in the lower left-hand corner is assigned to the player (row)choosing the move, while that of the upper right-hand corner is assigned to theopponent (column).

III. EXAMPLES OF CHEBYSHEV’S BOUND FOR ESTIMATING

GENERALIZATION PERFORMANCE IN IPD GAMES

We have addressed the difficulty for problems where the truegeneralization performance cannot be determined by solvinganalytically a closed-form formula and is computationally pro-hibitive by estimating the generalization performance using arandom sample of test strategies. However, a large sample of

random test strategies may be required to claim with a highprobability for a small precision value that the estimated valueis close to the true value of the generalization performance. It isknown that the Chebyshev’s bound given earlier is a loose upperbound.

Here, we conduct an empirical study for some practicalgames-the IPD games-and show that one can expect to obtaina better estimate of the generalization performance comparedwith the theoretical value given by Chebyshev’s bound whenconsidering the worst case (i.e., actual is lowerthan the Chebyshev’s bound). We also show how a strategy’sperformance profile with respect to the strategy space, thevariance , can be used through a second estimation to obtaina tighter Chebyshev’s bound. Finally, we show that gamescan affect the accuracy of the estimation procedure becausethey affect the strategy performance profile with respect to thestrategy space.

A. The IPD Game

In the IPD game, two players engaged in repeated interac-tions are given two choices, cooperate and defect [47]–[49].The dilemma embodies the tension between rationality of in-dividuals who are tempted to defect and rationality of the groupwhere every individual is rewarded by mutual cooperation [49]since both players who are better off mutually cooperating thanmutually defecting are vulnerable to exploitation by one of theparty who defects.

The classical IPD game can be formulated by considering apredefined payoff matrix that specifies the payoff that a playerreceives for the choice it makes given the choice that the oppo-nent makes. Referring to the payoff matrix given by Fig. 2, bothplayers receive (reward) units of payoff if both cooperate.They both receive (punishment) units of payoff if they bothdefect. However, when one player cooperates while the otherdefects, the cooperator will receive (sucker) units of payoffwhile the defector receives (temptation) units of payoff. Withthe classic IPD game, the values , and must satisfy theconstraints: and . Any setof values can be used as long as they satisfy the IPD constraints

Fig. 3. The payoff matrices for the two-player three-choice IPD game. Eachelement of the matrix gives the payoff for Player A.

Fig. 4. The payoff matrices for the two-player four-choice IPD game [25]. Eachelement of the matrix gives the payoff for Player A.

(we use , and in all our experi-ments unless stated otherwise). The game is played when bothplayers choose between the two alternative choices over a seriesof moves (i.e., repeated interactions).

The classical IPD game has been extended to more complexversions of the game to better model real-world interactions.One simple example that we will use in this paper is the IPDwith multiple, discrete levels of cooperation [21], [25], [26],[50], [51]. The -choice IPD game is defined by the payoffsobtained through the following linear interpolation:

(7)

where is the payoff to player A, given that and are thecooperation levels of the choices that players A and B make,respectively. For example, for the three-choice and four-choiceIPD games, the payoff matrix is given by Figs. 3 and 4, [25],respectively.

Note that in generating the payoff matrix for an -choice IPDgame, the following conditions must be satisfied [25]:

1) for and constant ;2) for and ;3) for and

.These conditions are analogous to those for the classical

IPDs. The first condition ensures that defection always paysmore. The second condition ensures that mutual cooperationhas a higher payoff than mutual defection. The third conditionensures that alternating between cooperation and defectiondoes not pay in comparison to just playing cooperation.

In the IPD, defection is not necessarily the best choice ofplay. Many studies have shown cooperative play to be a vi-able strategy [48], [49]. More importantly, cooperative play canbe learned from an initial, random population through coevo-lutionary learning [2], [52]–[55], which is different from theclassical evolutionary games approach that considers frequencydependent reproductions of predetermined strategies (e.g., “eco-logical approach” [49], [56]) that do not involve an adaptationprocess on some strategy representations.

Authorized licensed use limited to: UNIVERSITY OF BIRMINGHAM. Downloaded on June 11, 2009 at 11:53 from IEEE Xplore. Restrictions apply.

486 IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION, VOL. 12, NO. 4, AUGUST 2008

With the learning of IPD strategies, one important questionis whether the learned IPD strategy is “robust.” Axelrod [2]considered robustness of a strategy in the context of its per-formance against some expert-designed strategies. Darwen andYao [33]–[35] defined robustness in the context of generaliza-tion. These previous studies [2], [33]–[35] approached the issueof robustness (i.e., generalization) in an empirical and ad hocmanner. Here, we investigate the issue by applying the general-ization framework introduced earlier.

B. Different Definitions of Generalization Performance

Before applying the generalization framework to the coevo-lutionary learning of a game, we first need to define specifi-cally the generalization performance so that when we say thata strategy generalizes well, it is according to some well-de-fined performance criteria (which is determined by the defini-tion of game outcomes). For zero-sum games like chess, thegame outcome for a strategy can be easily defined, e.g., win,lose, or draw.3 When we apply the generalization frameworkwith this definition of game outcome for chess, we refer to howwell coevolutionary learning can produce a strategy that can winagainst as many strategies as possible.

However, for the nonzero-sum game such as IPD, there is nospecific notion of “win” that one defines for a game outcome.Instead, the “best” strategy is usually the one with the highestaverage payoffs of some games played (if one includes eachstrategy also playing against its own twin), i.e., round-robintournament. In other words, the game outcome is either the av-erage payoff per move, or the total payoff. This is not the same asthe case when one sums the number of individual “wins,” wherea win is defined as having higher average payoff per move in agame.

Here, we explore the generalization performance of coevolu-tionary learning for the IPD games from two perspectives, i.e.,we consider two different definitions of generalization perfor-mance. This is to allow a more in-depth investigation as to howcoevolutionary learning produces strategies in terms of theirgeneralization, i.e., whether the strategy generalizes to a spe-cific performance criterion, or to more than one performancecriterion.

1) Generalization Performance in Terms of the Number ofWins Based on Individual Outcomes: First, we define general-ization performance in terms of the number of wins based onindividual outcomes of games (e.g., a game between a pair ofstrategies). This definition of generalization performance fol-lows the notion from zero-sum games, where one is interestedin the performance of winning against as many known strate-gies as possible. If refers directly to the average payoffper move to strategy when it plays an IPD game with strategy, then a game outcome, which can either be win or lose, is de-

fined as follows:

3This is the widely accepted convention, although for our purpose here ofdetermining generalization performance, we can just consider a case of win andlose, i.e., a draw is considered a lose.

where and are constants that can be arbitrarilyspecified as long as .4

Using (3), the estimated generalization performance based ona sample of test strategies is given by

(8)

We use to make the notations simpler. We denoteto represent the true generalization performance for the

definition introduced above.2) Generalization Performance in Terms of Average Payoff:

Second, we define generalization performance in terms of av-erage payoff. This definition of generalization performance fol-lows the notion of average performance against a populationof strategies (tournaments for IPD games). Here, the game out-come is the average payoff per move, i.e., ,where and can be determined from (7).As such, the generalization performance refers to the averagepayoff from a round robin tournament of games. However, forsimplicity, and since is a large number and assuming there isalways a in

(9)

so that it is straightforward to use the Chebyshev’s Theorem toobtain the Chebyshev’s bound. We denote to representthe true generalization performance for the definition introducedabove.

C. Chebyshev’s Bounds in IPD Games

For the two definitions of generalization performance,Chebyshev’s Theorem can be applied to obtain their respec-tive Chebyshev’s bounds. However, as mentioned earlier, theChebyshev’s bound is an upper bound in probability, and inmost practical situations, is less than . That is, eventhough we are limited by using a sample of random test strate-gies of size in estimating the generalization performance, weexpect that for a particular precision value , the probability

is lower than the Chebyshev’s bound. Here, weinvestigate this empirically.

For simplicity, we will restrict our investigation to the caseof the IPD game with deterministic and reactive memory-onestrategies that consider their previous moves and opponent’sprevious moves. We consider the three-choice IPD game. Weassume that strategies are uniformly distributed in the strategyspace . For this case, it is possible to compute the true gen-eralization performance since the strategy space of

unique strategies is not too big. It is not possible for thefour-choice IPD game, while it is trivial for the two-choice IPDgame with .

1) Chebyshev’s Bounds for Different Generalization Perfor-mance in IPD Games: Experiments were conducted for thetwo definitions of generalization performance introduced ear-lier. For all the experiments, the estimated and true generaliza-tion performance were determined for 4000 strategies that were

4In this paper, we use � � � � ��� and � � � � �.We arbitrarily choose these values for presentation of results.

Authorized licensed use limited to: UNIVERSITY OF BIRMINGHAM. Downloaded on June 11, 2009 at 11:53 from IEEE Xplore. Restrictions apply.

CHONG et al.: MEASURING GENERALIZATION PERFORMANCE IN COEVOLUTIONARY LEARNING 487

Fig. 5. Showing the probability that �� � � � for � in [0.01, 0.1], where thesample of random test strategies � of size� is used to compute �� ���. Eachgraph is obtained by averaging from results of using 50 independent samples� with error bars representing 95% confidence interval. � gives the curvesfor the Chebyshev’s bounds with different � s. (a) � � ��� ���������.(b) � � ������������������.

randomly sampled. The proportion of the 4000 strategies wherewas determined. This proportion can be considered

as a rough estimate of the actual probability . Ex-periments were repeated for 50 samples of random test strate-gies for different s (rough % of ) at 10 (0.02%), 50(0.1%), 100 (0.2%), 500 (1%), 1000 (2%), 5000 (10%), 10 000(20%) that are drawn independently.5

Fig. 5 shows the results of the experiments for the gener-alization performance defined by and estimated with

. From the figure, it can be observed that the probability(roughly estimated by the proportion of 4000

samples of ) reduces asincreases from 10 to 10 000 for in [0.01, 0.1]. The em-

pirical probability curves are observed to be lower than thosegiven by Chebyshev’s bounds. Similar observation is also made

5We have verified for each sample � that no strategy is obtained more thanonce.

Fig. 6. Showing the probability that �� � � � for � in [0.01, 0.1], where thesample of random test strategies � of size � is used to compute �� ���. Eachgraph is obtained by averaging from results of using 50 independent samples� with error bars representing 95% confidence interval. � gives the curvesfor the Chebyshev’s bounds with different � s. (a) � � ������������.(b) � � ������������������.

for the generalization performance defined by and esti-mated with in Fig. 6.

Experiments show that normalizedis usually larger for all s compared with normalized

(e.g., Fig. 7). This can be explained byanalyzing the distribution (histogram) of forindividual strategies (that we randomly picked) against asample of test strategies (that we also randomly picked frommany samples). Figs. 8 and 9 show for four strategies, thedistribution of and , re-spectively, for a particular sample of for at 50, 5000, and10 000. On the one hand, the distribution ofis always bimodal, i.e., having two separate peaks, whichmay lead to higher . On the other hand, the distribution of

spreads over an interval, leading to lower .Considering the Chebyshev’s bound from (6), it is more likely

for higher values of when (whichcan be estimated with respect to ) is higher.

Authorized licensed use limited to: UNIVERSITY OF BIRMINGHAM. Downloaded on June 11, 2009 at 11:53 from IEEE Xplore. Restrictions apply.

488 IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION, VOL. 12, NO. 4, AUGUST 2008

Fig. 7. (a)–(d) shows for four different strategies �, the normalized absolute difference, �� � , is obtained for each definition of generalization performance (e.g.,� ��� and � ���). Each graph is obtained by averaging from results of using 50 independent samples � with error bars representing 95% confidence interval.Each subfigure also contains the curves of precision value � � ��

���� obtained from Chebyshev’s bounds that are labeled according the probability �

used. (a) Strategy #1. (b) Strategy #2. (c) Strategy #3. (d) Strategy #4.

In addition, another observation is that the distribution offor the two definitions of generalization perfor-

mance is quite similar for different sizes of starting from asmall of 50 (Figs. 8 and 9). These results indicate two im-plications for the estimation of generalization performance forIPD games investigated here. First, the estimation from using asmaller is as accurate as the estimation from using larger

s. Second, the estimations are stable in terms of varyingsample sizes starting from a small .

2) Can We Find a Tighter Chebyshev’s Bound?: TheChebyshev’s bound given by (5) assumes the worst case andtakes to upper bound the probability. We haveearlier provided a mathematical analysis to find a tighter boundthat requires the estimation of so that rather than using

is used instead (with some probabilitythat can be determined), i.e., a tighter Chebyshev’s bound

given by (6).Note that to have a high probability where

and , we want to be as highas possible by having a higher value for (which is where this

tighter bound is useful because we are hoping that is smalland that even for a larger value of ).

We apply Chebyshev’s Theorem for the random variableto investigate empirically how good is the estimation

of when using a sample of test strategies for the three-choiceIPD game (we can actually compute the exact value of forthis game)

We can simplify the above by lettingand so that we now obtain

Experiments were conducted for the two definitions of gener-alization performance to determine the probability (proportion)of 4000 random strategies, where for 50 samples

Authorized licensed use limited to: UNIVERSITY OF BIRMINGHAM. Downloaded on June 11, 2009 at 11:53 from IEEE Xplore. Restrictions apply.

CHONG et al.: MEASURING GENERALIZATION PERFORMANCE IN COEVOLUTIONARY LEARNING 489

Fig. 8. (a)–(d) shows for four different strategies �, the distribution of � ��� �� � � ��� for � at 50, 5000, and 10 000. (a) Strategy #1. (b) Strategy #2.(c) Strategy #3. (d) Strategy #4.

of from 10 to 10 000 that are drawn uniformly and inde-pendently.6 Figs. 10 and 11 summarize the results, which showthat generally the empirical probability curves are lower for es-timation of for compared with the case of .More importantly, the results show that a high probability canbe claimed that if we consider a large precisionvalue of .

We also present results for the estimation of for individualstrategies. Table I shows the for various and also the truevalue for some random strategies for . Table II showsthe results of for . If we consider a large preci-sion value of , we know from Chebyshev’sbound for example that for , the probability that

6Note that we actually compute ��� rather than �� . Regardless however, thefollowing results will show that most of the time, ��� overestimates � . Fur-thermore, at the end of the section, we will show results of corrected estimates�� and how it can be used to obtain a tighter Chebyshev’s bound.

is . However,as indicated earlier in Fig. 10(b), the actual estimation of isbetter than the theoretical value given by Chebyshev’s bound.For example, for is much smaller than400 for a probability greater than 0.98 (Table II). Similar obser-vations can be made for (Tables III and IV).7

A tighter Chebyshev’s bound can be found if. For example, for a particular strategy (#7 in Table I for), , while . The corrected esti-

7Note that results presented in the tables refer to the estimated generalizationperformance of a strategy � when a sample � of random test strategies is used.As such, �� in Table I for a strategy can fluctuate (higher or lower) around thetrue value � . For very small sample size such as � � �� that is out of Cheby-shev’s bound, erratic results such as �� � � can be obtained. As for results inTables II and IV, increasing� does not necessarily result with decreasing �� �.However, all the results are within theoretical results from Chebyshev’s bounds[e.g., with a probability greater than 0.98 and using � � ����� random teststrategies, is no greater than 400 for the case of � ���].

Authorized licensed use limited to: UNIVERSITY OF BIRMINGHAM. Downloaded on June 11, 2009 at 11:53 from IEEE Xplore. Restrictions apply.

490 IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION, VOL. 12, NO. 4, AUGUST 2008

Fig. 9. (a)–(d) shows for four different strategies �, the distribution of � ��� �� � � ��� for � at 50, 5000, and 10 000. (a) Strategy #1. (b) Strategy #2.(c) Strategy #3. (d) Strategy #4.

mate is found to be (where the probability thatis ). In

this case, we know that overestimates . Regardless, withprobability

, we know thatwith . Here, we obtain

(Fig. 12). Fig. 13 illustrates an-other example of a tighter Chebyshev’s bound for , where

(strategy #6 in Table III) and , where.

3) What Is the Impact of Different Games on the Estimationof Generalization Performance: We know that the estimation ofgeneralization performance of a particular strategy depends onits performance with all strategies in the strategy space [as infollowing (4)]. Furthermore, we have shown that the accuracyof the estimated of generalization performance that depends onthe profile of a strategy’s performance (i.e., game outcomes)against all strategies in the strategy space (e.g., depends onthe distribution of ).

We note that the distribution of for any strategydepends on the game. As such, we investigate this issue with

a series of experiments of IPD games with different payoff ma-trices: (a) IPD1 (the original game considered); (b) IPD2 (usingthe interpolation ); and (c) IPD3(using the interpolation ).

By penalizing strategies that alternate with different choices(IPD1 penalizes the most while IPD3 penalizes the least), thegames can affect the profile of a strategy’s performance with re-spect to all strategies in the strategy space given that a largeproportion of strategies in alternate between different choices.We repeat the earlier experiment to find the probability that

for these IPD games and compare the results.In general, the same results for are obtained for all the

IPD games. This is because for , we count the number ofwins for each pairwise interaction. Although the average payoffreceived from the pairwise interaction changes, is thesame because the relationship between and is stillthe same.8

However, for , results show for all s, the empiricalprobability curves for IPD3 is lower than IPD2, which in turn, islower than IPD1 (Fig. 14). Note that different IPD games award

8Note also that the number of rounds at 150 is sufficiently long.

Authorized licensed use limited to: UNIVERSITY OF BIRMINGHAM. Downloaded on June 11, 2009 at 11:53 from IEEE Xplore. Restrictions apply.

CHONG et al.: MEASURING GENERALIZATION PERFORMANCE IN COEVOLUTIONARY LEARNING 491

Fig. 10. Showing the probability that ����� �� � � for � in [0.0001, 0.04],where the sample of random test strategies � of size � is used to compute�� for � ���. Each graph is obtained by averaging from results of using 50independent samples � with error bars representing 95% confidence interval. gives the curves for the Chebyshev’s bounds with different � s. (a) � ���� ���������. (b) � � ������������������.

payoffs for plays with alternating choices (strategies that consti-tute a large proportion in ) differently. For example, the payoffdifferences between neighboring choices (e.g., comparedwith 0, or 0 compared with ) are reduced for IPD3 comparedwith IPD1. As such, it is not surprising that issmaller for the same strategy for IPD3 compared with IPD1due to reduced payoff differences of neighboring choices. It isalso observed that the distribution of for in-dividual strategies tends to be more peaky and centers around

for IPD3 compared with IPD1 and IPD2(Fig. 15). The smaller values of for the case of IPD3 imply amore accurate estimation from (4).

IV. ESTIMATING GENERALIZATION PERFORMANCE IN

COEVOLUTIONARY LEARNING

Here, we illustrate the application of the generalizationframework to coevolutionary learning of IPD games. In partic-ular, we focus on simple IPD games (e.g., IPD games with twoand three choices), where the true generalization performance

Fig. 11. Showing the probability that ����� �� � � for � in [0.0001, 0.04],where the sample of random test strategies � of size � is used to compute�� for � ���. Each graph is obtained by averaging from results of using 50independent samples � with error bars representing 95% confidence interval. gives the curves for the Chebyshev’s bounds with different � s. (a) � �������������. (b) � � ������������������.

can be computed. This allows us to investigate and show empir-ically that a small sample of can be used to provide a goodestimation of the generalization performance of coevolutionarylearning of IPD games in comparison with the prediction givenby Chebyshev’s bound.

A. Coevolutionary Learning Model

Fig. 16 illustrates the general coevolutionary learning frame-work, which involves an iterative process of variations (muta-tion and crossover) and selection (choosing solutions for thenext generation) on competing solutions that strategically in-teract with each other. A coevolutionary learning system canbe implemented using coevolutionary algorithms. There are avariety of different coevolutionary algorithms, and they can beroughly classified according to their population structure, i.e.,single population coevolution [2], [21]. [25] and multiple pop-ulation coevolution (which includes two populations structurethat separates solutions and tests used to evaluate fitness of so-lutions in their respective populations that do not mix or breed[12], [13], [38]). For simplicity, we apply the generalization

Authorized licensed use limited to: UNIVERSITY OF BIRMINGHAM. Downloaded on June 11, 2009 at 11:53 from IEEE Xplore. Restrictions apply.

492 IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION, VOL. 12, NO. 4, AUGUST 2008

TABLE ICOMPARISON OF �� FOR VARIOUS � WITH THE TRUE VALUE � FOR TEN STRATEGIES FOR � ���, WHERE � � ��� �����

TABLE IICOMPARISON OF ��� � � � FOR VARIOUS � FOR TEN STRATEGIES FOR � ���, WHERE � � ��� �����

TABLE IIICOMPARISON OF �� FOR VARIOUS � WITH THE TRUE VALUE � FOR TEN STRATEGIES FOR � ���, WHERE � � ��� ����

framework to coevolutionary learning with the single popula-tion structure. However, we note that the generalization frame-work does not depend on the learning systems, and as such, canbe applied to coevolutionary learning with the multiple popula-tion structure as well.

1) Strategy Representation: Several strategy representationshave been investigated in the past, e.g., a lookup table withbit-string encoding [2], finite-state machines [52]–[54], andneural networks [25], [51], [57], [58]. Here, we consider thesimple approach of direct lookup table strategy representationthat we introduced and studied earlier in [25] that directlyrepresents IPD strategy behaviors, a one-to-one mappingbetween the genotype space (strategy representation) and thephenotype space (behaviors). However, the main advantage ofusing the direct lookup table representation is that the search

space (given by the strategy representation) and the strategyspace is the same (assuming uniform strategy distribution inthe strategy space ). This simplifies and allows direct inves-tigation on the generalization performance of coevolutionarylearning.

Fig. 17 illustrates the direct lookup table representationfor the strategies with two choices and memory-one [25].

specifies the choice to be made, giventhe inputs (player’s own previous choice) and (opponent’sprevious choice). Rather than using pre-game inputs (two formemory-one strategies), the first move is specified indepen-dently, . For example, for the two-choice IPD (Fig. 17),each table element can take any of the two choices, e.g.,and . For the three-choice IPD (Fig. 18), each table elementcan take , and . Note that values in the table elements

Authorized licensed use limited to: UNIVERSITY OF BIRMINGHAM. Downloaded on June 11, 2009 at 11:53 from IEEE Xplore. Restrictions apply.

CHONG et al.: MEASURING GENERALIZATION PERFORMANCE IN COEVOLUTIONARY LEARNING 493

TABLE IVCOMPARISON OF ��� � � � FOR VARIOUS � FOR TEN STRATEGIES FOR � ���, WHERE � � ��� ����

Fig. 12. Example of obtaining a tighter Chebyshev’s bound for � ���(strategy #7 in Table I). The curve “Theoretical” gives the original Chebyshev’sbound if � is used. The curve “True” gives the actual bound if � isused. The curve “Corrected Estimate” gives the tighter Chebyshev’s bound if�� � is used. All curves are determined for � � �����.

Fig. 13. Example of obtaining a tighter Chebyshev’s bound for� ��� (strategy#6 in Table III). The curve “Theoretical” gives the original Chebyshev’s boundif � is used. The curve “True” gives the actual bound if � is used. Thecurve “Corrected Estimate” gives the tighter Chebyshev’s bound if �� �

is used. All curves are determined for � � �����.

are the same as those used to produce the payoffs in the payoffmatrix through a linear interpolation.

A simple mutation operator is used to generate offspring fromparents [25]. For the -choice IPD, mutation replaces the orig-

inal choice of a particular element in the direct lookup table (in-cludes and ) with one of the other remaining pos-sible choices with an equal probability of . Each tableelement has a fixed probability, , of being replaced by one ofthe remaining choices. The value is not optimized.

Crossover is not used because with the direct lookup tablefor strategy representation, any variation operator will intro-duce variations on behaviors directly. As investigated earlier in[25] (even for the more complex IPD game with intermediatechoices), a simple mutation operator is more than sufficient tointroduce the required variations of strategy behaviors. The useof crossover is not necessary.

2) Coevolutionary Learning Procedure: We refer to the fol-lowing coevolutionary learning procedure [25] as the classicalcoevolutionary learning (CCL).

1) Generation step, :Initialize POPSIZE/2 parent strategies,

, randomly.2) Generate POPSIZE/2 offspring,

, from POPSIZE/2 parents using a variation.3) All pairs of strategies compete, including the pair where

a strategy plays itself (i.e., round-robin tournament). ForPOPSIZE strategies in a population, every strategy com-petes a total of POPSIZE games.

4) Select the best POPSIZE/2 strategies based on total payoffsof all games played. Increment generation step, .

5) Steps 2 to 4 are repeated until termination criterion (i.e., afixed number of generation) is met.

All IPD games involve a fixed game length of 150 iterations.A fixed and sufficiently long duration for the evolutionaryprocess (e.g., ) is used. Note that unlike studies suchas [25] that consider the population evolving persistently to aparticular behavior (e.g., mutual cooperation), here we observethe generalization performance of coevolutionary learningduring the evolutionary process. Further note that unlike in[25], we do not consider noisy IPD games. All experiments arerepeated for 30 independent runs.

B. Measuring Generalization Performance of CoevolutionaryLearning

The generalization framework was earlier presented inthe context of a complete solution. However, coevolutionarylearning can be broadly classified either as cooperative coevo-lutionary learning [59], where solutions represent different

Authorized licensed use limited to: UNIVERSITY OF BIRMINGHAM. Downloaded on June 11, 2009 at 11:53 from IEEE Xplore. Restrictions apply.

494 IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION, VOL. 12, NO. 4, AUGUST 2008

Fig. 14. Comparison of the probability that �� � � � for � in [0.01, 0.1], where � of size � [(a)–(d)] is used to compute �� ��� for different IPD games.Each graph is obtained by averaging from results of using 50 independent samples � with error bars representing 95% confidence interval. � gives the curvesfor the Chebyshev’s bounds with different� s. For each subfigure where a particular value of� is used, the Chebyshev’s bound for� is also plotted. (a)� � ��.(b) � � ��. (c) � � ���. (d) � � ���.

components that are combined to produce the complete solu-tion, or competitive coevolutionary learning whereby a solutionis complete [38]. Here, we investigate generalization perfor-mance of competitive coevolutionary learning systems such asCCL for simplicity since we can use a strategy to represent thepopulation in terms of generalization performance. However,there is no prerequisite to choose the best evolved strategy of thepopulation to be the representative. The best evolved strate-gies from the population,(ordered according to their internal fitness values), can beconsidered, which allows the following measurements of gen-eralization performance of coevolutionary learning.

• Best

(10)

• Average

(11)

• Ensemble

(12)

In particular, “ensemble” allows the measurement of the gen-eralization performance of the coevolved population rather thanthe best strategy. That is, it can be used to determine whether thepopulation as a whole generalizes better (i.e., a higher level ofcompetence for the game) compared with an individual strategy.For simplicity, we assume an ensemble system with a perfectgating mechanism [36], i.e., it perfectly chooses the best evolvedstrategy in that best performs against each of the teststrategies, for the “ensemble” measurement. For example, forfive test strategies, if outperforms the first two test strate-gies, and if outperforms the last three test strategies, the“ensemble” of the two evolved strategies outperforms all test

Authorized licensed use limited to: UNIVERSITY OF BIRMINGHAM. Downloaded on June 11, 2009 at 11:53 from IEEE Xplore. Restrictions apply.

CHONG et al.: MEASURING GENERALIZATION PERFORMANCE IN COEVOLUTIONARY LEARNING 495

Fig. 15. (a)–(d) shows for four different strategies �, the distribution of � ��� �� �� ��� for � � ��� for IPD1, IPD2, and IPD3 games. Actual calculationsfor the variance show that the values decrease from IPD1 to IPD2 to IPD3. For example, for strategy #1, � �� � for IPD1, IPD2, and IPD3 are 1.56 (1.59), 0.92(0.95), and 0.47 (0.48), respectively. (a) Strategy #1. (b) Strategy #2. (c) Strategy #3. (d) Strategy #4.

strategies, i.e., it will choose for the first two test strate-gies, and for the others. Note that this measurement isonly relevant for .

C. Generalization Performance of Coevolutionary Learningfor Problems With Small Strategy Space

For problems with a small strategy space,9 it is possible todetermine the true generalization performance of the coevolu-tionary learning system using (2). For example, the true gen-eralization performance of CCL for the two-choice IPD andthree-choice IPD games can be computed for and .

For the following experiments, we usedand 300 generations of evolutionary process for the CCL whenapplied to the two-choice and three-choice IPD games (note thatfor both IPD games, ). Instead of taking the

9Again, this should not be confused with the search space, which depends onthe strategy representation. A strategy space can be small and discrete, but thesearch space can be infinitely large and continuous, i.e., real-valued neural net-work representation of a deterministic and reactive memory-one IPD strategy.

entire population, we considered only the top best strategies ofthe population (i.e., ) when taking the and

measurements of generalization performance.1) True Generalization Performance in the Coevolutionary

Learning of the IPD With Two and Three Choices: Fig. 19 andTables V and VI summarize the generalization performance ofCCL for . Fig. 20 and Tables VII and VIII summarizefor . In general, the generalization performance of CCLduring the evolutionary process differs for the two-choice andthree-choice IPD games. On the one hand, for CCL of thetwo-choice IPD, generalization performance decreases as evo-lutionary process continues (Figs. 19(a) and 20(a), and Tables Vand VII). On the other hand, for CCL of the three-choice IPD,generalization performance remains around similar levels asthe evolutionary process continues (Figs. 19(b) and 20(b), andTables VI and VIII).

The poor generalization performance for the simpler two-choice IPD case is due to the population of CCL overspecial-izing to more naive cooperators, e.g., “all cooperate.” In par-ticular, for , naive cooperators do not generalize well

Authorized licensed use limited to: UNIVERSITY OF BIRMINGHAM. Downloaded on June 11, 2009 at 11:53 from IEEE Xplore. Restrictions apply.

496 IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION, VOL. 12, NO. 4, AUGUST 2008

Fig. 16. The general framework of coevolutionary learning.

Fig. 17. The direct lookup table representation for the deterministic and reac-tive memory-one IPD strategy that consider two choices (also includes� forthe first move, which is not shown in the figure).

Fig. 18. The direct lookup table representation for the deterministic and reac-tive memory-one IPD strategy that consider three choices (also includes �for the first move, which is not shown in the figure).

against all strategies in the strategy space because they are easilyexploited. That is, for a naive cooperator, strategy , playingagainst strategies , its average payoff per move is lower orthe same as that of the opponents’, i.e., forall in . Since for , the game outcome only register a“win” for strategy if , a naive cooperator willhave poor generalization performance of . This explainswhy CCL generalizes poorly for [Fig. 19(a)] rather than

[Fig. 20(a)].The higher tendency of CCL to overspecialize to naive coop-

erators for IPD games with less choices can be explained by con-sidering that in the search space, the proportion of naive cooper-ators is higher when there are less choices to play. For example,for a strategy to play “all cooperate” in the two-choice IPD, itmust play full cooperation for the first move, i.e., ,and play full cooperation regardless of the choices for its pre-vious move and opponent’s previous move, i.e., the followingvalues in the direct lookup table:

where can be or in the two-choice IPD. Ifand that the choices in the first row are all , then regardless ofthe choices in the remaining element of the table, this strategy

Fig. 19. Generalization performance of CCL defined by � ���. Thegraph “Best” plots the generalization performance measurement given by������ �. The graph “Average” plots the generalization performancemeasurement given by ���� �. The graph “Ensemble” plots thegeneralization performance measurement given by ���� �. All graphsare averaged from measurements over 30 independent runs. (a) Two-choiceIPD. (b) Three-choice IPD.

will always play full cooperation since it will never access ele-ments other than those in the first row. Given that there arecombinations of such tables, one can easily calculate the pro-portion of “all cooperate” in as (since the total numberof unique combinations of the table is ).

Authorized licensed use limited to: UNIVERSITY OF BIRMINGHAM. Downloaded on June 11, 2009 at 11:53 from IEEE Xplore. Restrictions apply.

CHONG et al.: MEASURING GENERALIZATION PERFORMANCE IN COEVOLUTIONARY LEARNING 497

Fig. 20. Generalization performance of CCL defined by � ���. Thegraph “Best” plots the generalization performance measurement given by������ �. The graph “Average” plots the generalization performancemeasurement given by ���� �. All graphs are averaged from mea-surements over 30 independent runs. (a) two-choice IPD. (b) three-choice IPD.

TABLE VTHE FOLLOWING GIVES THE MEAN AND ERROR (95% CONFIDENCE

INTERVAL) OVER 30 INDEPENDENT RUNS FOR EACH MEASUREMENT

FOR GENERALIZATION PERFORMANCE DEFINED BY � ��� AT THE

START AND AT THE END OF CCL FOR THE TWO-CHOICE IPD

For the three-choice IPD case, strategies play “all cooperate”if and that the table is given as follows:

where can be , or in the three-choice IPD. The pro-portion of “all cooperate” in is (since the total numberof unique combinations of the table is ). As such, one can

TABLE VITHE FOLLOWING GIVES THE MEAN AND ERROR (95% CONFIDENCE

INTERVAL) OVER 30 INDEPENDENT RUNS FOR EACH MEASUREMENT

FOR GENERALIZATION PERFORMANCE DEFINED BY � ��� AT THE

START AND AT THE END OF CCL FOR THE THREE-CHOICE IPD

TABLE VIITHE FOLLOWING GIVES THE MEAN AND ERROR (95% CONFIDENCE

INTERVAL) OVER 30 INDEPENDENT RUNS FOR EACH MEASUREMENT

FOR GENERALIZATION PERFORMANCE DEFINED BY � ��� AT THE

START AND AT THE END OF CCL FOR THE TWO-CHOICE IPD

TABLE VIIITHE FOLLOWING GIVES THE MEAN AND ERROR (95% CONFIDENCE

INTERVAL) OVER 30 INDEPENDENT RUNS FOR EACH MEASUREMENT

FOR GENERALIZATION PERFORMANCE DEFINED BY � ��� AT THE

START AND AT THE END OF CCL FOR THE THREE-CHOICE IPD

easily obtain the proportion of “all cooperate” for the -choiceIPD, which is given by . It canbe observed that the proportion of “all cooperate” decreasesexponentially with . For the two-choice IPD, the proportionis 12.5%. For the three-choice IPD, the proportion is approxi-mately 1%.

It is known from our previous study of CCL for the -choiceIPD games using the same direct lookup table representation[25], the population is more likely to evolve to play mutual coop-eration. However, for a population of mutual cooperators, CCLcannot distinguish between different strategies that are mutu-ally cooperating because they have similar fitness values calcu-lated from average payoffs from round-robin tournaments. Forthe two-choice IPD, naive cooperators constitute a large 12.5%of the search space, and as such, it is more likely for the pop-ulation to drift to naive cooperators when CCL fails to distin-guish between mutually cooperating strategies. This is reflectedin our results that show on average, around 12% of the popu-lation is “all cooperate” for the two-choice IPD case comparedwith around 0.2% for the three-choice IPD case.

This result also highlights why a simple coevolutionarylearning system does not guarantee searching for a solution thatgeneralizes well even for simple problems. This is despite thepossibility of having searched for a total number of strategiesduring the entire evolutionary process that exceeds that of thestrategy space, e.g., CCL for the two-choice IPD with a strategyspace of 32 unique strategies. What is important to note isthat in a simple coevolutionary learning system, the selectionprocess used is not necessarily biased towards the general-ization performance considered, e.g., requiring solutions to

Authorized licensed use limited to: UNIVERSITY OF BIRMINGHAM. Downloaded on June 11, 2009 at 11:53 from IEEE Xplore. Restrictions apply.

498 IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION, VOL. 12, NO. 4, AUGUST 2008

Fig. 21. Statistics on generalization performance of CCL for � ���. (a) Plots�� against � . (b) Plots actual � �� ��� � � ���� against � . All graphs areaveraged from 30 independent runs for ������ � measurements witherror bars representing 95% confidence interval.

be tested against all test cases to rank them properly at everygeneration.

2) Estimated Generalization Performance in the Coevolu-tionary Learning of the IPD With Three Choices: Here, we in-vestigate the estimated generalization performance of coevolu-tionary learning using a sample of random test strategies drawnfrom the strategy space. We consider the three-choice IPD witha strategy space with unique strategies (it is trivialto investigate for the two-choice IPD with only uniquestrategies). We use with varying for different s (rough% of ) at 10 (0.02%), 50 (0.1%), 100 (0.2%), 500 (1%), 1000(2%), 5000 (10%), 10 000 (20%) that are drawn uniformly andindependently (e.g., we have verified that no strategy is includedmore than once).

Figs. 21 and 22 give various statistics of the generalizationperformance of the CCL at the end of the evolutionary processfor and , respectively, where different s wereused, for the three-choice IPD. There are two graphs in each

Fig. 22. Statistics on generalization performance of CCL for � ���. (a) Plots�� against � . (b) Plots actual � �� ��� � � ���� against � . All graphs areaveraged from 30 independent runs for ������ � measurements witherror bars representing 95% confidence interval.

figure. Graphs (a) and (b) plot and actual , respec-tively, for different s, averaged over 30 independent runs with95% confidence interval.

We first consider graphs (a) in Figs. 21 and 22. It was math-ematically shown earlier that the probabilityis bounded by a value that is directly dependent on and re-ciprocally dependent on and . Here, we observed that forevolved strategies, remains similar (with some fluctuations)to a value smaller than the maximum value of as in-creases. Results from Figs. 21(b) and 22(b) show that actual

decreases as increases. More importantly, the actualobtained from using a is lower than the precision

value obtained from the Chebyshev’s bound for a larger valueof . For example, the actual obtained from using

with is lower than the given by Chebyshev’sbound for and .10

10Further note that for this particular empirical result, lower � �� �� � can beobtained for smaller number of � (e.g., Fig. 22(b) for � at 5000 and 10 000).This is due to the particular� sample used to compute �� from 30 independentruns (e.g., averaged over 30 �� s). For other � s, results may differ.

Authorized licensed use limited to: UNIVERSITY OF BIRMINGHAM. Downloaded on June 11, 2009 at 11:53 from IEEE Xplore. Restrictions apply.

CHONG et al.: MEASURING GENERALIZATION PERFORMANCE IN COEVOLUTIONARY LEARNING 499

Earlier, we showed empirically that the Chebyshev’s boundis a loose upper bound in probability for ,and that in general, one would expect to do better, e.g., obtain amore accurate estimate using a sample of . Here, resultsfrom actual implementation of the generalization framework tocoevolutionary learning further suggest that the estimated value

is closer to the true value that what one can claim fromChebyshev’s bound when a sample of is used. In particular,for the IPD games, a small sample of random test strategies isgood enough to estimate the generalization performance.

V. ESTIMATING GENERALIZATION PERFORMANCE IN THE

CASE OF A BIASED TEST SAMPLE

A. Motivation

The generalization framework provides a means to estimatethe generalization performance using a random test sample.However, it is noted that the generalization performance ofa solution is conditioned on the underlying distribution fromwhich the random test sample is drawn. For different under-lying distributions, the generalization performance of a solutionmay differ and the ranking between solutions in terms of theirgeneralization performances may change. This raises the issueof determining which of the underlying distributions wheregeneralization performances are measured are more importantand useful given a specific problem.

For game-playing, one is more interested to estimate the gen-eralization performance with respect to a biased test sample,e.g., biased towards test strategies that are more likely to be en-countered in a real scenario such as tournaments rather than anunbiased sample where every test strategy is equally likely to beencountered. Here, one may assume that the biased test sampleconsists of “good” strategies based on some performance cri-teria, e.g., strategies that have outperformed a large number ofother strategies.

B. Generalization Performance of Coevolutionary LearningBased on a Biased Sample of Test Strategies

1) Obtaining a Biased Sample of Test Strategies: Many paststudies have introduced and used ad hoc methods to obtain abiased sample. One approach is to use expert human knowl-edge to obtain biased test samples [2], [58]. However, there arepractical limitations for this approach. First, the expert humanknowledge required to obtain the biased test sample may not bereadily available. Second, there is the problem of scalability interms of competence levels that limits the utility of the biasedtest sample as a means from which the generalization perfor-mance is estimated, e.g., evolved strategies loosing to all teststrategies.

Here, we investigate an alternative to using experts-designedtest strategies with a procedure that samples the strategy spaceto obtain the biased test sample. It is noted that sampling fora biased test sample from the strategy space is very difficult inpractice because there is no notion of a “goodness” metric inthe space from which one can sample “good” strategies. Fur-thermore, specific properties in games such as intransitivity can

present significant challenges to obtaining a useful biased testsample for the estimation of generalization performance, e.g., astrategy may be observed to generalize well simply because itexploits a specific vulnerability that all the test strategies happento possess.

In this study, we consider a notion where a “goodness” metriccan be induced on the strategy space if one can enumerativelyidentify and order each point (strategy) with respect to the en-tire space based on a performance criterion. We expect that astrategy obtained from this approach will be appropriate in caseswhere the game allows for some structures in the space, e.g.,transitivity, where a strategy outperforms other strategies withlower performance values. A biased test sample can be obtainedwith a procedure that samples points of higher performanceswith higher probability.

However, given that this is very difficult to perform in prac-tice most of the time (e.g., the strategy space is very large), onewould have to rely on some heuristics to obtain the biased testsample. Here, we introduce a simple procedure based on themultiple partial enumerative search that extends our earlier ap-proach in [35] to obtain a biased sample of test strategies. Theprocedure is described as follows.

1) Partial enumerative search run, :Sample PS strategies, , randomly.

2) Every strategy competes with all other strategies in thesample, including its twin (i.e., each strategy competes atotal of PS games).

3) Detect the strategy index so that isyielding the highest total payoff. Let . Incre-ment run, .

4) Repeat steps one to three PE times to obtain PE-sized bi-ased sample of test strategies, .

Note that the selection procedure to obtain the best teststrategy is the same as the fitness evaluation used in CCL for amore direct comparison. We also note that there is intransitivyin the IPD game.

We require the population size for each partial enumerativesearch, PS, to be larger than the maximum number of uniquestrategies that can be obtained from an evolutionary run, i.e.,

, for two reasons. First, thetest strategy obtained from the partial enumerative search islikely not to have been encountered (trained with) by evolvedstrategies. Second, the test strategy has competed and outper-formed a significantly larger number of opponents comparedwith evolved strategies, and as such, is likely to be more chal-lenging than the opponents encountered by evolved strategies.

The procedure of repeating the partial enumerative search PEindependent times is aimed at obtaining a more diverse sampleof test strategies compared with the earlier approach in [35].A sample of PE test strategies obtained from a single partialenumerative search may be less diversed as these strategies canbe behaviorally similar (i.e., having the same weakness that canbe exploited). Furthermore, repeating the partial enumerativesearch may address the problem of obtaining lower performingtest strategies due to a particular poor sample of the populationin a partial enumerative search.

2) Generalization Performance of Coevolutionary LearningBased on Biased Versus Unbiased Samples for Problems With

Authorized licensed use limited to: UNIVERSITY OF BIRMINGHAM. Downloaded on June 11, 2009 at 11:53 from IEEE Xplore. Restrictions apply.

500 IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION, VOL. 12, NO. 4, AUGUST 2008

Fig. 23. Comparison between � ��� and �� ���s for CCL across the evo-lutionary process, averaged over 30 independent runs (“Best” measurementsonly). (a) Multiple partial enumerative search with population sizes of one, four,and ten. (b) Multiple partial enumerative search with population sizes of 50, 100,and 10 000.

Small Strategy Space: We first compare the generalization per-formance of biased and unbiased samples of test strategies inthe case of the three-choice IPD game. For the case of gen-eralization performance with respect to biased samples of teststrategies, we obtain a sample of biased strategies from 20 par-tial enumerative searches, which results in a sample of 20 teststrategies. We investigate different samples of biased test strate-gies obtained from multiple partial enumerative search of dif-ferent population sizes, one, four, , and . Notethat only when a population size of 10 000 is used that the proce-dure satisfies the requirement of having searched more strategiescompared with CCL (e.g.,

). Despite this, we can investigate the impact ofpopulation size used in the multiple partial enumerative searchprocedure in producing a biased and diverse sample of “good”test strategies that is challenging for evolved strategies.

For simplicity in notation, we refer to the estimated gener-alization performance using a biased sample of test strategies(e.g., ) as and that correspond to definitions

Fig. 24. Comparison between � ��� and �� ���s for CCL across the evo-lutionary process, averaged over 30 independent runs (“Best” measurementsonly). (a) Multiple partial enumerative search with population sizes of one, four,and ten. (b) Multiple partial enumerative search with population sizes of 50, 100,and 10 000.

given by and , respectively. Results from experi-ments for comparisons between and , andand are summarized in Figs. 23 and 24, respectively. Ingeneral, increasing the population size for the multiple partialenumerative search leads to a sample of test strategies that are“harder to defeat”, e.g., evolved strategies have lower general-ization performance when they competed against the sample ofbiased test strategies (i.e., ) compared with the case of gener-alization performance against unbiased test strategies (i.e., ).The difference between the two generalization performances,

and , is more pronounced especially when the biasedsample of test strategies are produced using multiple partial enu-merative search with larger population sizes [e.g., Figs. 23(b)and 24(b)].

Note, however, for the generalization performance defined interms of average payoff, i.e., and (Fig. 24), gen-eralization should be viewed in the context of maximizing av-erage payoff of oneself for any set of opponents. “Good” strate-gies that generalize well are those that maximize their average

Authorized licensed use limited to: UNIVERSITY OF BIRMINGHAM. Downloaded on June 11, 2009 at 11:53 from IEEE Xplore. Restrictions apply.

CHONG et al.: MEASURING GENERALIZATION PERFORMANCE IN COEVOLUTIONARY LEARNING 501

payoff, which does not necessarily imply minimizing the op-ponent’s average payoff. That is, obtaining good generaliza-tion performance can be a result of cooperating with opponentsrather than attempting to exploit opponents that are not naiveand may retaliate. For example, it is not unexpected that re-sults are obtained for evolved strategies where is higherwhen biased samples of test strategies obtained from the mul-tiple partial enumerative search with increasingly higher popu-lation sizes are used. The value of is higher for the caseof using a population size of 10 000 for the multiple partial enu-merative search compared with the case of using a smaller pop-ulation size of 50 or 100 [Fig. 24(b)]. The biased sample of teststrategies obtained from the procedure with a population size of10 000 are “better” compared with those obtained from usingsmaller population size in the sense that they reciprocate coop-eration, which results with all strategies having higher averagepayoffs, instead of attempting to exploit and risk other strategiesretaliating, which would result with strategies having lower av-erage payoffs.

3) Generalization Performance of Coevolutionary LearningBased on Biased Versus Unbiased Samples for Problems WithLarge Strategy Space: Most practical real-world problems ingame-playing often involve a very large strategy space (e.g.,chess). In this case, it is not possible to compute the true gen-eralization performance. Instead, one will have to rely on theestimated generalization performance. However, it is also pos-sible or more likely, that for problems with very large strategyspace, a large proportion of the strategies are mediocre. As such,for real-world games, one is more interested in the generaliza-tion performance of a strategy obtained from a learning systemagainst a biased sample of “good” test strategies because such aperformance comparison is more important and these strategiesare more likely to be encountered in practical situations such asgame tournaments.

For simplicity, instead of using complex real-world gamessuch as chess, we investigate the four-choice IPD game for tworeasons. First, even assuming a strategy space restricted to de-terministic and reactive memory-one strategies with uniformstrategy distribution in the space, it is not possible with ourcurrent computing resources to compute the true generalizationperformance. As such, we have to estimate the generalizationperformance. Second, we can easily extend the previous inves-tigation on estimating generalization performance of coevolu-tionary learning against a biased sample of test strategies ob-tained from the multiple partial enumerative search.

We first consider a sample of random teststrategies to estimate the generalization performance against anunbiased sample. Considering , Chebyshev’s boundgives .We note that for our purpose here, a probability of at least

that is sufficient, giventhe high computation cost for using with .However, as we have shown earlier for the three-choice IPD,the actual estimation may be closer to the true value comparedwith what one can claim with Chebyshev’s bound. We conductexperiments with of different sizes, e.g., at 500, 10 000,and 50 000. Assuming that the estimated value obtained fromusing with strategies is the true value, we

TABLE IXCOMPARISON OF � �� �� � FOR DIFFERENT DEFINITIONS OF GENERALIZATION

PERFORMANCE WITH � GIVEN BY CHEBYSHEV’S BOUND FOR � � �����.VALUES FOR � �� �� � ARE AVERAGED OVER 30 INDEPENDENT RUNS OF

CCL WITH 95% CONFIDENCE INTERVAL

can compare the accuracy of the estimation obtained from usingsmaller s. Note that for the following experiments, the CCLuses and that the evolutionary process last 600generations (a moderate increase of CCL’s ability to learn strate-gies for the more complex four-choice IPD game).

Table IX summarized the results of the comparison betweenfor different definitions of generalization performance

taken at the end of the evolutionary run of CCL with givenby Chebyshev’s bound when considering the probability

and the corresponding values of . Note that is notadjusted because we did not calculate the true value, but in-stead, use an estimated value obtained with larger value of

. The adjusted can be higher than what is given in thetable if we consider the worst case. Results from the table showsthat the actual is significantly lower than the givenby Chebyshev’s bound. As such, similar to the three-choiceIPD case, empirical results suggest that a smaller sample ofcan be used to estimate the generalization performance and stillobtain a sufficiently accurate estimation for the coevolutionarylearning of four-choice IPD.

Here, we compare the generalization performance of CCLbetween the case of using unbiased and biased samples of teststrategies for the more complex four-choice IPD game. The bi-ased sample of test strategies is obtained using a multiple partialenumerative search. Each partial enumerative search is appliedon a population of strategies. This easily satisfies the re-quirement since CCL will search for at most,unique strategies. The procedure is repeated 20 times to obtain abiased sample of 20 test strategies. This allows us to investigatewhether the multiple partial enumerative search procedure canproduce a biased and diverse sample of test strategies that arechallenging for the evolved strategies.

Results are summarized in Figs. 25 and 26 for and, and and , respectively. The most impor-

tant observation is that, in general, comparison betweenand for the CCL on the more complex four-choice IPD issimilar to that of the three-choice IPD case investigated ear-lier, i.e., evolved strategies do not perform well against a bi-ased sample compared with an unbiased sample. For example,evolved strategies that generalized well for the search space[Fig. 25(a)] in terms of winning individual games performedpoorly against the biased sample of test strategies [Fig. 25(b)].However, for the case of , unlike the three-choice IPDcase, evolved strategies were unable to maximize their averagepayoff against the biased sample of test strategies [Fig. 26(a)and (b)]. This result is due in part to CCL producing strategies

Authorized licensed use limited to: UNIVERSITY OF BIRMINGHAM. Downloaded on June 11, 2009 at 11:53 from IEEE Xplore. Restrictions apply.

502 IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION, VOL. 12, NO. 4, AUGUST 2008

Fig. 25. Comparison between �� ��� and �� ��� of CCL across the evolu-tionary process with “Best,” “Average,” and “Ensemble” measurements, for thefour-choice IPD. All graphs are averaged over 30 independent runs. (a) Unbi-ased test sample. (b) Biased test sample.

that are less cooperative when the number of choices for the IPDgame is increased since the proportion of naive cooperators inthe strategy space is smaller.

VI. CONCLUSION

In this paper, we have presented a theoretical framework formeasuring the generalization performance of coevolutionarylearning. Although the generalization framework is presentedin terms of game-playing, our result is more general andnot restricted to problems of game-playing. In particular, astrategy’s generalization performance is its average perfor-mance against all strategies. As such, the best generalizationperformance is the maximum average performance against allstrategies. Although the definition is simple, it can be difficultto determine by solving analytically a closed-form formula andis computationally prohibitive.

To address this problem, we have presented a principledapproach to estimate the generalization performance by com-puting the average performance against a sample of randomtest strategies. We have provided a mathematical analysis of

Fig. 26. Comparison between �� ��� and �� ��� of CCL across the evolu-tionary process with “Best” and “Average” measurements, for the four-choiceIPD. All graphs are averaged over 30 independent runs. (a) Unbiased testsample. (b) Biased test sample.

the probability that the absolute difference between the esti-mated and true value exceeds a given error (precision value)is bounded by a value that is reciprocally dependent only onthe square of the error and the size of test sample that is used.As such, regardless of the complexity of the game (or problemin general), one can obtain a better estimate by increasing thesize of the test sample. However, a tighter probability boundcan be obtained for a strategy if the variance of its performancewith respect to the strategy space is known (which can beestimated as well). In addition, we showed that games alsoaffect the accuracy of the estimation for some sizes of randomtest samples because they affect the variance of a strategy’sperformance with respect to the strategy space.

We have illustrated the generalization framework to deter-mine the generalization performance of coevolutionary learningof the -choice IPD games. For the empirical study, we have in-vestigated two definitions of generalization performance basedon different performance criteria, e.g., in terms of the numberof wins based on individual outcomes and in terms of averagepayoff. In particular, it is known that the bounds in probability

Authorized licensed use limited to: UNIVERSITY OF BIRMINGHAM. Downloaded on June 11, 2009 at 11:53 from IEEE Xplore. Restrictions apply.

CHONG et al.: MEASURING GENERALIZATION PERFORMANCE IN COEVOLUTIONARY LEARNING 503

(Chebyshev’s bounds) that we used in our analysis is a loosebound and that, in general, the actual estimated value is closer tothe true value than predicted by Chebyshev’s bounds. We haveshown experimentally for the IPD games, a small test samplecan be used to provide a sufficiently accurate estimation of thegeneralization performance.

In the context of game-playing, the generalization in termsof average performance against “good” strategies (e.g., foundin tournaments) may be more important than that against allstrategies. We have introduced and investigated a simple ap-proach using the multiple partial enumerative search to obtaina diverse sample of test strategies that are challenging to allowthe estimation of generalization performance against the biasedtest sample. This approach does not require human expertise,and provides a more direct and meaningful comparison betweenevolved strategies and the test strategies because the later strate-gies are obtained from enumerative search of a larger populationsize compared with the maximum that coevolutionary learningcan achieve. We have demonstrated the approach on the co-evolutionary learning of IPD games. In particular, experimentsshow that the generalization performance of a coevolutionarylearning system against a biased test sample is lower comparedwith that of an unbiased test sample for IPD games.

The theoretical framework introduced here is the first steptowards applying a rigorous and quantitative approach to an-alyze the performance of coevolutionary learning through thenotion of generalization from machine learning. It is now pos-sible to analyze whether coevolutionary learning leads to in-creasing generalization performance for a particular problem.For example, one can now investigate earlier claims whether di-versity maintenance techniques actually help to improve gen-eralization performance of coevolutionary learning. The gen-eralization framework allows for a quantitative comparison ofwhether a particular coevolutionary learning system generalizesbetter compared with other systems. In addition, the general-ization framework also provides tools for analysis and furtherstudy of coevolutionary dynamics in context of how generaliza-tion performance changes during an evolutionary process. Forexample, the impact of different selection procedures to howgeneralization performance changes during coevolution can beinvestigated.

APPENDIX

APPLYING CHEBYSHEV’S THEOREM

Chebyshev’s Theorem [45]: Consider a random variabledistributed according to the probability density . Given apositive number , we can bound the probability that

or , i.e., the probability that falls outside ,by

where is the mean of the new random variablewith respect to .

First

where . Applying Chebyshev’s Theorem withas the random variable, we obtain the following:

where is with respect to distributionfor a positive number .

Next, we need to determine, which is given as follows:

We note the following points.1) We pick strategies and into sample inde-

pendently, which implies

2) We use the same rules to pick and , whichimplies

3) We can obtain the following:

and that the same applies to as well.

Authorized licensed use limited to: UNIVERSITY OF BIRMINGHAM. Downloaded on June 11, 2009 at 11:53 from IEEE Xplore. Restrictions apply.

504 IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION, VOL. 12, NO. 4, AUGUST 2008

4) As such

Given the above, we can now obtain asfollows:

where .

ACKNOWLEDGMENT

The authors would like to thank Dr. S. Lucas for his role asthe acting Editor-in-Chief for this paper. The authors would alsolike to thank the anonymous associate editor and three reviewersfor their constructive comments that have helped to improve thispaper significantly. The authors are grateful to The Centre ofExcellence for Research in Computational Intelligence and Ap-plications (CERCIA), School of Computer Science, The Uni-versity of Birmingham, UK, for providing computing supportin running the experiments, and to the School of Computer Sci-ence at the University of Birmingham for its studentship to thefirst author.

REFERENCES

[1] X. Yao, “Introduction,” Informatica (Special Issue on Evol. Comput.),vol. 18, pp. 375–376, 1994.

[2] R. Axelrod, “The evolution of strategies in the iterated prisoner’sdilemma,” in Genetic Algorithms and Simulated Annealing, L. D.Davis, Ed. New York: Morgan Kaufmann, 1987, ch. 3, pp. 32–41.

[3] K. Chellapilla and D. B. Fogel, “Evolution, neural networks, games,and intelligence,” Proc. IEEE, vol. 87, no. 9, pp. 1471–1496, Sept.1999.

[4] D. B. Fogel, “An introduction to simulated evolutionary optimization,”IEEE Trans. Neural Netw., vol. 5, no. 1, pp. 3–14, 1994.

[5] T. Back, U. Hammel, and H. P. Schwefel, “Evolutionary computa-tion: Comments on the history and current state,” IEEE Trans. Evol.Comput., vol. 1, no. 1, pp. 3–17, Apr. 1997.

[6] X. Yao, Evolutionary Computation: Theory and Applications. Singa-pore: World Scientific, 1999.

[7] S. Luke and R. P. Wiegand, “When coevolutionary algorithms ex-hibit evolutionary dynamics,” in Proc. Workshop on UnderstandingCoevolution: Theory and Analysis of Coevolutionary Algorithms(GECCO’02), A. Barry, Ed., 2002, pp. 236–241.

[8] E. Popovici and K. D. Jong, “Relationships between internal andexternal metrics in co-evolution,” in Proc. IEEE 2005 Congr. Evol.Comput. (CEC’05), Edinburgh, U.K., Sep. 2–5, 2005, pp. 2800–2807.

[9] S. Nolfi and D. Floreano, “Co-evolving predator and prey robots: Do“arms races” arise in artificial evolution?,” Artif. Life, vol. 4, no. 4, pp.311–335, 1998.

[10] S. G. Ficici and J. B. Pollack, “Challenges in coevolutionary learning:Arms-race dynamics, open-endedness, and mediocre stable states,” inProc. 6th Int. Conf. Artif. Life, C. Adami, R. K. Belew, H. Kitano, andC. Taylor, Eds., 1998, pp. 238–247.

[11] R. A. Watson and J. B. Pollack, “Coevolutionary dynamics in a minimalsubstrate,” in Proc. 2001 Genetic Evol. Comput. Conf. (GECCO’Ol),CA, 2001, pp. 702–709.

[12] E. D. de Jong and J. B. Pollack, “Ideal evaluation from coevolution,”Evol. Comput., vol. 12, no. 2, pp. 159–192, 2004.

[13] J. Cartlidge and S. Bullock, “Combating coevolutionary disengage-ment by reducing parasite virulence,” Evol. Comput., vol. 12, no. 2,pp. 193–222, 2004.

[14] J. Noble, “Finding robust texas hold’em poker strategies using Paretocoevolution and deterministic crowding,” in Proc. 2002 Int. Conf.Mach. Learn. Appl. (ICMLA’02), NV, 2002, pp. 233–239.

[15] K. O. Stanley and R. Miikkulainen, “Competitive coevolution throughevolutionary complexification,” J. Artif. Intell. Res., vol. 21, pp.63–100, 2004.

[16] S. Y. Chong, M. K. Tan, and J. D. White, “Observing the evolutionof neural networks learning to play the game of Othello,” IEEE Trans.Evol. Comput., vol. 9, no. 3, pp. 240–251, Jun. 2005.

[17] T. P. Runarsson and S. M. Lucas, “Coevolution versus self-play tem-poral difference learning for acquiring position evaluation in small-board go,” IEEE Trans. Evol. Comput., vol. 9, no. 6, pp. 540–551, Dec.2005.

[18] W. D. Hillis, “Co-evolving parasites improve simulated evolution as anoptimization procedure,” Physica D, vol. 42, pp. 228–234, 1990.

[19] H. Juille and J. B. Pollack, “Co-evolving the “ideal” trainer: Applica-tion to the discovery of cellular automata rules,” in Proc. 3rd Annu.Conf. Genetic Programming 1998, J. R. Koza, W. Banzhaf, K. Chel-lapilla, K. Deb, M. Dorigo, D. B. Fogel, M. H. Garzon, D. E. Goldberg,H. Iba, and R. Riolo, Eds., 1998, pp. 519–527.

[20] J. Werfel, M. Mitchell, and J. P. Crutchfield, “Resource sharing andcoevolution in evolving cellular automata,” IEEE Trans. Evol. Comput.,vol. 4, no. 4, pp. 388–393, Nov. 2000.

[21] P. J. Darwen and X. Yao, “Co-evolution in iterated prisoner’s dilemmawith intermediate levels of cooperation: Application to missile de-fense,” Int. J. Comput. Intell. Appl., vol. 2, no. 1, pp. 83–107, 2002.

[22] C. D. Rosin and R. K. Belew, “New methods for competitive coevolu-tion,” Evol. Comput., vol. 5, no. 1, pp. 1–29, 1997.

[23] S. G. Ficici, “Solution concepts in coevolutionary algorithms,” Ph.D.dissertation, Brandeis Univ., Waltham, MA, 2004.

[24] S. G. Ficici and J. B. Pollack, “Pareto optimality in coevolutionarylearning,” in Proc. 6th Eur. Conf. Advances in Artif. Life (ECAL’01), ,Prague, Czech Republic, Sep. 10–14, 2001, vol. 2159, Lecture Notesin Computer Science, pp. 316–325.

[25] S. Y. Chong and X. Yao, “Behavioral diversity, choices, and noise inthe iterated prisoner’s dilemma,” IEEE Trans. Evol. Comput., vol. 9,no. 6, pp. 540–551, Dec. 2005.

[26] P. J. Darwen and X. Yao, “Why more choices cause less coopera-tion in iterated prisoner’s dilemma,” in Proc. IEEE 2001 Congr. Evol.Comput. (CEC’01), Seoul, Korea, May 27–30, 2002, pp. 987–994.

[27] S. G. Ficici, O. Melnik, and J. B. Pollack, “A game-theoretic and dy-namical-systems analysis of selection methods in coevolution,” IEEETrans. Evol. Comput., vol. 9, no. 6, pp. 580–602, Dec. 2005.

[28] D. Cliff and G. F. Miller, “Tracking the red queen: Measurements ofadaptive progress in co-evolutionary simulations,” in Proc. 3rd Eur.Conf. Advances in Artif. Life, 1995, vol. 929, Lecture Notes in Com-puter Science, pp. 200–218, Springer-Verlag.

[29] D. Floreano and S. Nolfi, “God save the red queen! competition inco-evolutionary robotics,” in Proc. 2nd Annu. Conf. Genetic Program-ming, J. R. Koza, K. Deb, M. Dorigo, D. B. Fogel, M. Garzon, H. Iba,and R. L. Riolo, Eds., 1997, pp. 398–406.

[30] K. O. Stanley and R. Miikkulainen, “The dominance tournamentmethod of monitoring progress in coevolution,” in Proc. 2002 GeneticEvol. Comput. Conf. (GECCO’02) Bird of a Feather Workshop: , NewYork, 2002, pp. 242–248.

[31] A. Bader-Natal and J. B. Pollack, “Towards metrics and visualizationssensitive to coevolutionary failures,” AAAI Tech. Rep. FS-05–03, Co-evol. Coadaptive Syst., AAAI Fall Symp., 2005, Washington, DC.

[32] C. M. Bishop, Neural Networks for Pattern Recognition. Oxford,U.K.: Oxford Univ. Press, 1995.

[33] X. Yao, Y. Liu, and P. J. Darwen, “How to make best use of evo-lutionary learning,” in Complex Systems—From Local Interactionsto Global Phenomena, R. Stacker, H. Jelinck, B. Burnota, and T.Bossomaier, Eds. Amsterdam, The Netherlands: IOS Press, 1996,pp. 229–242.

[34] P. J. Darwen and X. Yao, “On evolving robust strategies for iteratedprisoner’s dilemma,” in Progress in Evolutionary Computation, ser.Lecture Notes in Artificial Intelligence. New York: Springer, 1995,vol. 956, pp. 276–292.

[35] P. J. Darwen, “Co-evolutionary learning by automatic modularizationwith speciation,” Ph.D. dissertation, Univ. New South Wales, Sydney,Australia, Nov. 1996.

Authorized licensed use limited to: UNIVERSITY OF BIRMINGHAM. Downloaded on June 11, 2009 at 11:53 from IEEE Xplore. Restrictions apply.

CHONG et al.: MEASURING GENERALIZATION PERFORMANCE IN COEVOLUTIONARY LEARNING 505

[36] P. J. Darwen and X. Yao, “Speciation as automatic categorical modu-larization,” IEEE Trans. Evol. Comput., vol. 1, no. 2, pp. 101–108, Jul.1997.

[37] Y. G. Seo, S. B. Cho, and X. Yao, “Emergence of cooperative coalitionin NIPD game with localization of interaction and learning,” in Proc.IEEE 1999 Congr. Evol. Comput. (CEC’99). Piscataway, NJ: IEEEPress, 1999, pp. 877–884.

[38] C. D. Rosin, “Coevolutionary search among adversaries,” Ph.D. disser-tation, Univ. California, San Diego, CA, 1997.

[39] R. P. Wiegand and M. A. Potter, “Robustness in cooperative coevo-lution,” in Proc. 2006 Conf. Genetic Evol. Comput. (GECCO’06),Seattle, WA, 2006, pp. 369–376.

[40] M. Bowling, “Convergence and no-regret in multiagent learning,” inAdvances in Neural Inf. Process. Syst., 2005, vol. 17, pp. 209–216.

[41] R. Powers and Y. Shoham, Computing best-response strategies viasampling Comput. Sci. Dept., Stanford Univ., Stanford, CA, 2003,Tech. Rep..

[42] S. G. Ficici and J. B. Pollack, “A game-theoretic memory mecha-nism for coevolution,” in Proc. 2003 Conf. Genetic Evol. Comput.(GECCO’03), 2003, vol. 2723, Lecture Notes in Computer Science,pp. 286–297.

[43] E. D. de Jong, “The incremental Pareto-coevolution archive,” in Proc.2004 Conf. Genetic Evol. Comput. (GECCO’04), 2004, pp. 525–536.

[44] E. D. de Jong, “The maxsolve algorithm for coevolution,” in Proc.2005 Conf. Genetic Evol. Comput. (GECCO’05), New York, 2005, pp.483–489.

[45] B. V. Gnedenko and G. V. Gnedenko, Theory of Probability. NewYork: Taylor & Francis, 1998.

[46] H. I. Jacobson, “The maximum variance of restricted unimodal distri-butions,” Ann. Math. Stat., vol. 40, no. 5, pp. 1746–1752, 1969.

[47] R. Axelrod, The Evolution of Cooperation. New York: Basic Books,1984.

[48] R. Axelrod, “Effective choice in the prisoner’s dilemma,” J. ConflictResolution, vol. 24, no. 1, pp. 3–25, Mar. 1980.

[49] R. Axelrod, “More effective choice in the prisoner’s dilemma,” J. Con-flict Resolution, vol. 24, no. 3, pp. 379–403, Sept. 1980.

[50] X. Yao and P. J. Darwen, “How important is your reputation in a multi-agent environment,” in Proc. IEEE 1999 Conf. Syst., Man, Cybern.(SMC’99), Tokyo, Japan, Oct. 12–15, 1999, pp. 575–580.

[51] P. J. Darwen and X. Yao, “Does extra genetic diversity maintain esca-lation in a co-evolutionary arms race,” Int. J. Knowledge-Based Intell.Eng. Syst., vol. 4, no. 3, pp. 191–200, July 2000.

[52] D. B. Fogel, “The evolution of intelligent decision making in gaming,”Int. J. Cybern. Systems, vol. 22, pp. 223–236, 1991.

[53] D. B. Fogel, “Evolving behaviors in the iterated prisoner’s dilemma,”Evol. Compu., vol. 1, no. 1, pp. 77–97, 1993.

[54] D. B. Fogel, “On the relationship between the duration of an encounterand the evolution of cooperation in the iterated prisoner’s dilemma,”Evol. Comput., vol. 3, no. 3, pp. 349–363, 1996.

[55] B. A. Julstrom, “Effects of contest length and noise on reciprocal al-truism, cooperation, and payoffs in the iterated prisoner’s dilemma,” inProc. 7th Int. Conf. Genetic Algorithms (ICGA’97), 1997, pp. 386–392.

[56] R. Axelrod and W. D. Hamilton, “The evolution of cooperation,” Sci-ence, vol. 211, pp. 1390–1396, 1981.

[57] P. G. Harrald and D. B. Fogel, “Evolving continuous behaviors in theiterated prisoner’s dilemma,” BioSystems: Special Issue on the Pris-oner’s Dilemma, vol. 37, pp. 135–145, 1996.

[58] N. Franken and A. P. Engelbrecht, “Particle swarm optimization ap-proaches to coevolve strategies for the iterated prisoner’s dilemma,”IEEE Trans. Evol. Comput., vol. 9, no. 6, pp. 562–579, Dec. 2005.

[59] M. A. Potter and K. A. D. Jong, “Cooperative coevolution: An archi-tecture for evolving coadapted subcomponents,” Evol. Comput., vol. 8,no. 1, pp. 1–29, 2000.

Siang Yew Chong (M’99) received the B.Eng.(Hon.) and M.Eng.Sc. degrees in electronics en-gineering from Multimedia University (MMU),Melaka, Malaysia, in 2002 and 2004, respectively,and the Ph.D. degree in computer science from theUniversity of Birmingham, Birmingham, U.K., in2007.

He was a Research Associate with the Centre ofExcellence for Research in Computational Intelli-gence and Applications (CERCIA), a Member of theNatural Computation Research Group, and currently

an Honorary Research Fellow with the School of Computer Science, Universityof Birmingham. His research interests include evolutionary computation,neural networks, coevolution, iterated prisoner’s dilemma, and game theory.

Dr. Chong is the recipient of the Student Travel Grant for the 2003 Congresson Evolutionary Computation (CEC’03).

Peter Tiño received the M.Sc. degree from theSlovak University of Technology, Bratislava, Slo-vakia, in 1988 and the Ph.D. degree from the SlovakAcademy of Sciences, Slovakia, in 1997.

He was a Fulbright Fellow at the NEC ResearchInstitute, Princeton, NJ, from 1994 to 1995. He wasa Postdoctoral Fellow at the Austrian Research Insti-tute for AI in Vienna, Austria, from 1997 to 2000, anda Research Associate at the Aston University, U.K.,from 2000 to 2003. He is with the School of Com-puter Science, University of Birmingham, U.K., since

2003. and is currently a Senior Lecturer. He is on the editorial board of severaljournals. His main research interests include probabilistic modeling and visu-alization of structured data, statistical pattern recognition, dynamical systems,evolutionary computation, and fractal analysis.

Dr. Tiño is a recipient of the Fulbright Fellowship in 1994. He was awardedthe Outstanding Paper of the Year for the IEEE TRANSACTIONS ON NEURAL

NETWORKS with T. Lin, B. G. Horne, and C. L. Giles, in 1998, for the work onrecurrent neural networks. He won the 2002 Best Paper Award at the Interna-tional Conference on Artificial Neural Networks with B. Hammer.

Xin Yao (M’91–SM’96–F’03) received the B.Sc. de-gree from the University of Science and Technologyof China (USTC), Hefei, in 1982, the M.Sc. degreefrom the North China Institute of Computing Tech-nology, Beijing, in 1985, and the Ph.D. degree fromUSTC in 1990.

He was an Associate Lecturer and Lecturer from1985 to 1990 at USTC, while working towards thePh.D. degree. He took up a Postdoctoral Fellowshipin the Computer Sciences Laboratory, Australian Na-tional University (ANU), Canberra, in 1990, and con-

tinued his work on simulated annealing and evolutionary algorithms. He joinedthe Knowledge-Based Systems Group, CSIRO Division of Building, Construc-tion and Engineering, Melbourne, in 1991, working primarily on an industrialproject on automatic inspection of sewage pipes. He returned to Canberra in1992 to take up a lectureship in the School of Computer Science, UniversityCollege, University of New South Wales (UNSW), Australian Defence ForceAcademy (ADFA), where he was later promoted to a Senior Lecturer and Asso-ciate Professor. Attracted by the English weather, he moved to the University ofBirmingham, Birmingham, U.K., as a Professor of Computer Science in 1999.Currently, he is the Director of the Centre of Excellence for Research in Com-putational Intelligence and Applications (CERCIA), a Distinguished VisitingProfessor of the University of Science and Technology of China, Hefei, and avisiting professor of three other universities. He has more than 200 publications.He is an associate editor or editorial board member of several journals. He is theEditor of the World Scientific Book Series on Advances in Natural Computa-tion. He has given more than 45 invited keynote and plenary speeches at con-ferences and workshops worldwide. His major research interests include evolu-tionary artificial neural networks, automatic modularization of machine learningsystems, evolutionary optimization, constraint handling techniques, computa-tional time complexity of evolutionary algorithms, coevolution, iterated pris-oner’s dilemma, data mining, and real-world applications.

Dr. Yao was awarded the President’s Award for Outstanding Thesis by theChinese Academy of Sciences for his Ph.D. work on simulated annealing andevolutionary algorithms. He won the 2001 IEEE Donald G. Fink Prize PaperAward for his work on evolutionary artificial neural networks. He is the Editor-in-Chief of the IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION.

Authorized licensed use limited to: UNIVERSITY OF BIRMINGHAM. Downloaded on June 11, 2009 at 11:53 from IEEE Xplore. Restrictions apply.