Game Theory Behavioral Finance

Complex dynamics in learning complicated games

Tobias Galla 1 and J. Doyne Farmer2,3

January 12, 2011

1The University of Manchester, School of Physics andAstronomy, Schuster Building, Manchester M13 9PL,United Kingdom2Santa Fe Institute, 1399 Hyde Park Road, Santa Fe, NM875013LUISS Guido Carli, Viale Pola 12, 00198 Roma, Italy

Game theory is the standard tool used to model strate-gic interactions in evolutionary biology and socialscience1, 2. Traditional game theory studies the equilib-ria of simple games3, 4. But is traditional game theoryapplicable if the game is complicated, and if not, whatis? We investigate this question here, defining a com-plicated game as one with many possible actions, andtherefore many possible payoffs conditional on thoseactions. We investigate two-person games in which theplayers learn based on experience7–10. By generatinggames at random5, 6, 11, 12 we show that under some cir-cumstances the strategies of the two players convergeto fixed points, but under others they follow limit cy-cles or chaotic attractors. The key parameters are thememory loss in the players’ learning algorithm and thecorrelation of the payoffs of the two players, which de-termines the extent to which the game is zero sum. Thedimension of the chaotic attractors can be very high,implying that the dynamics of the strategies are effec-tively random. In the chaotic regime the payoffs fluc-tuate intermittently, showing bursts of rapid changepunctuated by periods of quiescence, similar to theclustered volatility observed in financial markets13 andfluid turbulence14. Our results suggest that for com-plicated strategic interactions there is a large parame-ter regime in which the tools of dynamical systems aremore useful than those of standard equilibrium gametheory.

INTRODUCTION

Traditional game theory gives a good understanding forsimple games with a few players, or with only a fewpossible actions, characterizing the solutions in terms oftheir equilibria3, 4. The applicability of this approach isnot clear when the game becomes more complicated, forexample due to more players or a larger strategy space,

which can cause an explosion in the number of possibleequilibria5, 6, 11, 12. This is further complicated if the play-ers are not rational and must learn their strategies 7–10. Ina few special cases it has been observed that the strategiesdisplay complex dynamics and fail to converge to equilib-rium solutions15. Are such games special, or is this typi-cal behavior? More generally, under what circumstancesshould we expect that games become so hard to learn thattheir dynamics fail to converge? What kind of behaviorshould we expect and how should we characterize the so-lutions?

Here we show that for complicated games, undera wide variety of circumstances one should expect com-plex dynamics in which the players never converge to afixed strategy. Instead their strategies continually vary aseach player responds to past conditions and attempts todo better than the other players. This corresponds to high-dimensional chaotic dynamics, suggesting that for mostintents and purposes the behavior is essentially random.

Description of the model

We study 2-player games. For convenience call the twoplayers Alice and Bob. At each time step t player µ ∈{Alice = A, Bob = B} chooses between one of N possi-ble actions, picking the ith action with frequency xµi (t),where i = 1, . . . , N . The frequency vector xµ(t) =(xµ1 , . . . , x

µN ) is the strategy of player µ. If Alice plays

i and Bob plays j, Alice receives receives payoff ΠAij

and Bob receives payoff ΠBji. We assume that the play-

ers learn their strategies xµ via a form of reinforcementlearning called experience weighted attraction. This hasbeen extensively studied by experimental economists whohave shown that it provides a reasonable approximationfor how real people learn in games7, 9. Actions that haveproved to be successful in the past are played more fre-quently and actions that have been less successful areplayed less frequently. To be more specific, the proba-bility of the actions is

xµi (t) =eβQ

µi (t)∑

k eβQµk (t)

, (1)

1

Figure 1: An illustration of the complex game dynamics of the strategy xµi (t) for different parameters. There areN = 50 possible actions for each player. The game dynamics ranges from (a) a limit cycle to (b and c) low andintermediate dimensional chaotic motion to (d) high-dimensional chaos. The upper panel shows three-dimensionalprojections of the attractors in 98-dimensional phase space, scaling each coordinate logarithmically. Lower panelsdepict the time series of the corresponding three coordinates. Note the scale: As the dimension of the attractorincreases, so does the range of xµi . For the highest dimensional case a given action has occasional bursts where it ishighly probable, and long periods where its probability is extremely small (at low as 10−24).

where Qµi is called the attraction for player i to strategyµ. Alice’s strategy attractions are updated according to

QAi (t+ 1) = (1− α)QAi (t) + α∑j

ΠAijx

Bj (2)

and similarly for Bob.

The dynamics for updating the strategies xµi of thetwo players are completely deterministic. This approxi-mates the situation in which the players vary their strate-gies slowly in comparison to the timescale on which theyplay the game. The key parameters that characterize thelearning strategy are α and β. The parameter β is calledthe intensity of choice; when β is large a small histori-cal advantage for a given action causes that action to bevery probable, and when β = 0 all actions are equallylikely. The parameter α specifies the memory in the learn-ing; when α = 1 there is no memory of previous learningsteps, and when α = 0 all learning steps are rememberedand are given equal weight, regardless of how far in thepast. The caseα = 0 corresponds to the well-known repli-cator dynamics used to describe evolutionary processes inpopulation biology1, 16.

We choose games at random by drawing the el-ements of the payoff matrices Πµ

ij from a normal dis-tribution. The mean and the covariance are chosen sothat E[Πµ

ij ] = 0, E[(Πµij)

2] = 1/N , and E[ΠAijΠ

Bji] =

(1 + Γ)/N , where E[x] denotes the average of x. Thevariable Γ is a crucial parameter which measures the de-viation from a zero-sum game. When Γ = −1 the gameis zero sum, i.e. the amount Alice wins is equal to theamount Bob loses, whereas when Γ = 0 their payoffs areuncorrelated.

Results

When we simulate games with N = 50 pure strategieswe observe very different behaviors depending on param-eters. In many cases we see stable learning dynamics, inwhich the stategies of both players evolve to reach a fixedpoint. For a large section of the parameter space, how-ever, the strategies xµ(t) do not settle to a fixed point, butrather relaxe onto a more complicated orbit, either a limitcycle or a chaotic attractor. We characterize the attrac-tors in the unstable attractor space by numerically com-

2

puting the Lyapunov exponents λi, i = 1, . . . , 2N − 2,which characterize the rate of expansion or contraction ofnearby points in the state space. The Lyapunov exponentsalso determine the Lyapunov dimension D, which char-acterizes the number of degrees of freedom of the motionin the 98-dimensional state space. We give several exam-ples of the game dynamics xµi (t) in Fig. 1, including alimit cycle and chaotic attractors of varying dimensional-ity. There can also be long transients in which the trajec-tory follows a complicated orbit for a long time and thensuddenly collapses into a fixed point. Of course, the be-havior that is observed depends on the random draws ofthe payoff matrices Πµ

ij , but as we move away from thestability boundary we observe fairly consistent behavior.

Simulating games at many different parameter val-ues reveals the stability diagram given in Fig. 2. Roughlyspeaking we find that the dynamics are stable1 whenΓ ≈ −1 (zero sum games) and α/β is large (short mem-ory), i.e. in the lower right of the diagram, and unstablewhen Γ ≈ 0 (uncorrelated payoffs) and α/β is small (longmemory), i.e. in the upper left. Interestingly, for reasonsthat we do not understand the highest dimensional be-havior is observed when the payoffs are moderately anti-correlated (Γ ≈ −0.6) and when players have good mem-ories and do not discount past payoffs by a lot (α/β ≈ 0).In this case we often find thatD = 2N −2, i.e. the attrac-tor fills all the dimensions of the phase space.

A good approximation of the boundary betweenthe stable and unstable regions of the parameter space canbe computed analytically using techniques from statisticalphysics. We use path-integral methods from the theory ofdisordered systems17 to derive a stochastic process for an‘effective’ strategy in the limit of infinite payoff matrices,N → ∞. The stability of fixed points of the represen-tative process can be computed in a continuous-time limit(see Supplementary Material). This also allows us to showthat in this limit, at fixed Γ stability depends only on theratio α/β, but not on α or β separately.

We have also simulated these games at various val-ues of N . Not surprisingly, we find that if D > 0 at smallN , the dimension D tends to increase with N . At thisstage we have been unable to tell whether D reaches afinite limit as N →∞.

1Note that the fixed point reached in the stable regime is only a Nashequilibrium at Γ = 0 and in the limit α → 0. When α > 0 the playersare effectively assuming their opponent’s behavior is non-stationary, andthat more recent moves are more useful than moves in the distant past.

Figure 2: Stability diagram showing regions in parameterspace where stable learning is found and where chaoticmotion occurs. The solid line is obtained from the path-integral analysis (see Supplementary Material), and indi-cates the onset of chaos in the continuous system. Thecoloured squares are data from simulations of the dynam-ics (1,2) and represent the typical dimension of the attrac-tor (averaged over eight independent payoff matrices perdata point).

Another interesting property of this system is thetime dependence of the received payoffs. As shown inFig. 3, when the dynamics are chaotic the payoff varies,with intermittent bursts of large fluctuations punctuatedby relative quiescence. This is observed, although to vary-ing degrees, throughout the chaotic part of the parame-ter space. There is a strong resemblance to the clusteredvolatility observed in financial markets (which in turn re-sembles fluctuations observed in fluid turbulence)14. Wealso observe heavy tails in the distribution of the fluc-tuations, as well as a concentration of power at low-frequencies, as described in more detail in the Supple-mentary Information. This suggests that these properties,which have received a great deal of attention in studiesof financial markets, may occur simply because they aregeneric properties of complicated games 2.

2In contrast to financial markets, the correlation function of the clus-tered volatility and the distribution of heavy tails decay exponentially (asopposed to following a power law). We hypothesize that this is becausethe players in financial markets use a variety of different timescales α.

3

Figure 3: Chaotic dynamics displays clustered volatility.We plot the difference of payoffs on successive time stepsfor case (c) in Fig. 1. The amplitude of the fluctuationsincreases with the dimension of the attractor.

Why is dimensionality relevant?

The fact that the equilibria of a game are unlearnablewith any particular learning algorithm, such as reinforce-ment learning, does not imply that learning is not possiblewith some other learning algorithm. For example, if thelearning dynamics settles into a limit cycle or a low di-mensional attractor, a careful observer could collect dataand make better predictions about the other player usingthe method of analogues18, or refinements based on localapproximation19. If the dimension of the chaotic attractoris too high, however, the curse of dimensionality makesthis impossible with any reasonable amount of data19. Inthis case it is not clear that any algorithm can provide animprovement. The observation of high-dimensional dy-namics here leads us to conjecture that there are somegames that are inherently unlearnable, in the sense thatany learning algorithms the two players use will inevitablyresult in high-dimensional chaotic learning dynamics (Seealso Sato et al.15).

Conclusions

The approach we have taken here makes it possible to esti-mate a priori the properties of the dynamics of any givencomplicated game 2 player game where the players usereinforcement learning. This is because the payoff matrixof any given game is a possible draw from an ensemble of

random games. One can then guess at the behavior of thatgame under the players’ learning algorithm by locating itin the stability diagram of Fig. 2. We have shown that akey property of a game is its “zero-sumness”, character-ized by Γ. Games become harder to learn (in the sense thatthe strategies do not converge) when they are non-zero-sum, particularly if the players use learning algorithmswith long memory. Our approach gives a methodologyfor classifying the learnability of games by extending thistype of analysis to multiplayer games, games on networks,alternative learning algorithms, etc. It suggests that undermany circumstances it is more useful to abandon the toolsof classic game theory in favor of those of dynamical sys-tems. It also suggests that many behaviors that have at-tracted considerable interest, such as clustered volatilityin financial markets, may simply be examples of a highlygeneric phenomenon.

References1. M. A. Nowak, Evolutionary dynamics, Harvard Uni-

versity Press, Cambridge MA (2006)

2. J. Hofbauer, K. Sigmund, Evolutionary games andpopulation dynamics, Cambridge University Press,Cambridge, 1998

3. J. Nash, Equilibrium points in n-person games, Pro-ceedings of the National Academy of Sciences 36 (1)48-49 (1950)

4. J. von Neumann, O. Morgenstern, Theory of Gamesand Economic Behaviour, Princeton University Press,Princeton NJ (2007)

5. A. McLennan, J. Berg, The asymptotic expectednumber of Nash equilibria of two player normal formgames, Games and Economic Behavior 51(2), 264-295 (2005)

6. J. Berg, M. Weigt, Entropy and typical properties ofNash equilibria in two-player Games, Europhys. Lett.48(2), 129-135 (1999).

7. T. H. Ho, C. F. Camerer, J.-K. Chong, Self-tuningexperience weighed attraction learning in games, J.Econ. Theor. 133 177-198 (2007)

8. C. Camerer, T.H. Ho, Experience-weighted attrac-tion learning in normal form games, Econometrica67 (1999) 827

9. C, Camerer, Behavioral Game Theory: Experimentsin Strategic Interaction (The Roundtable Series in

4

Behavioral Economics), Princeton University Press,Princeton NJ, 2003

10. D. Fudenberg, D.K. Levine, Theory of Learning inGames, MIT Press, Cambridge MA (1998)

11. M. Opper, S. Diederich, Phase transition and 1/fnoise in a game dynamical model, Phys. Rev. Lett.69 1616-1619 (1992)

12. S. Diederich, M. Opper, Replicators with random in-teractions: A solvable model, Phys. Rev. A 39 4333-4336 (1989).

13. Engle, R. F., Autoregressive conditional het-eroscedasticity with estimates of the variance ofUnited Kingdom inflation, Econometrica 50 (4) 987-1007 (1982)

14. Ghashghaie, S., Breymann, W., Peinke, J., Talkner,P., Dodge, Y., Turbulent cascades in foreign exchangemarkets, Nature 381 767-770 (1996)

15. Y. Sato, E. Akiyama, J. D. Farmer, Chaos in learninga simple two-player game, Proc. Nat. Acad. Sci. USA99 4748-4751 (2002)

16. Y. Sato, J.-P. Crutchfield, Coupled replicator equa-tions for the dynamics of learning in multiagent sys-tems, Phys. Rev. E 67 015206(R) (2003)

17. De Dominicis, C., Phys. Rev. B 18 4913 (1978)

18. Lorenz, E. N., Atmospheric predictability revealed bynaturally occurring analogues, J. Atmos. Sci. 26, 636-646 (1969)

19. J. D. Farmer, J. J. Sidorowich, Predicting chaotic timeseries, Phys. Rev. Lett. 59 845848 (1987)

Acknowledgements We would like to thank National ScienceFoundation grant 0624351. We would also like to thank YuzuruSato and Nathan Collins for useful discussions.

Competing Interests The authors declare that they have nocompeting financial interests.

Author Contributions Both authors were involved in the de-sign and analysis of the model. TG ran the simulations and car-ried out the analytical calculations. JDF and TG wrote the paper.

Correspondence Correspondence and requests for materialsshould be addressed to JDF ([email protected]).

5

Documents

Game Theory Behavioral Finance