12
Backward Induction By: Rickard Fors Aumann According to Aumann, common knowledge of rationality (CKR) implies backward induction in perfect information games. He presents a model by which he formalizes and proves this argument. He feels that his model does well in reflecting the usual meanings of both knowledge and rationality. Aumann also considers it highly intuitive that CKR is possible in perfect information games, and that it does indeed lead to backward induction. He does however say that his proof depends on the model being used, and that his paper in no way contradicts other papers using other models. Aumann lists six important features for his model: * Each player has a strategy that tells him what to do at every node, even those that are never reached. Aumann indeed proves by example, that the backward induction solution is not always reached if this is not the case. At every vertex, the player making the decision asks himself what the other players would do in the future (even if not given the opportunity to), given how he chooses there. A strategy is interpreted as something similar to a robot doing pre- programmed moves at a vertex, and such moves are programmed for every vertex. * At node v, a player only considers what happens from thereon forward. Every choice at every vertex has to be rational in its own right; it cannot rely on previous choices to be justified. * It is rational to maximize payoffs. While CKR leads to the backward induction solution, other choices may be rational if one considers past play by opponents. For example if everyone does not always choose down

Backward Induction

Embed Size (px)

Citation preview

Page 1: Backward Induction

Backward Induction

By: Rickard Fors

Aumann

According to Aumann, common knowledge of rationality (CKR) implies backward induction in perfect information games. He presents a model by which he formalizes and proves this argument. He feels that his model does well in reflecting the usual meanings of both knowledge and rationality. Aumann also considers it highly intuitive that CKR is possible in perfect information games, and that it does indeed lead to backward induction. He does however say that his proof depends on the model being used, and that his paper in no way contradicts other papers using other models. Aumann lists six important features for his model:

* Each player has a strategy that tells him what to do at every node, even those that are never reached.

Aumann indeed proves by example, that the backward induction solution is not always reached if this is not the case. At every vertex, the player making the decision asks himself what the other players would do in the future (even if not given the opportunity to), given how he chooses there. A strategy is interpreted as something similar to a robot doing pre-programmed moves at a vertex, and such moves are programmed for every vertex.

* At node v, a player only considers what happens from thereon forward.

Every choice at every vertex has to be rational in its own right; it cannot rely on previous choices to be justified.

* It is rational to maximize payoffs.

While CKR leads to the backward induction solution, other choices may be rational if one considers past play by opponents. For example if everyone does not always choose down at every vertex in the Centipede game, one could get a higher payoff oneself by not beginning with down.

* CKR is possible in every game of perfect information.

So the result is never empty.

* Knowledge is certainty, not probability 1 belief.

Although probabilities could be introduced, and the rational thing would be to maximize expected utility in that case. If so, then CKR would still imply backward induction.

* Time of knowledge is at the start of the game, before any action.

It is also argued that if at the start of the game a player knows a move to be irrational, he will still know that when the time comes for him to make the move. Also if he does not know it to

Page 2: Backward Induction

be irrational at that time, he could not have known it at beginning of the game. It is thus concluded that ex post rationality implies ex ante rationality. This also means that common knowledge of ex post rationality implies backward induction.

Considering the case when vertices that are not on the backward induction path are reached, there is no CKR, and thus the results of the paper do not apply.

Aumann concludes by saying that if there is no CKR, the inductive choice may very well be irrational. He even says that he makes no recommendations of whether or not there should be CKR at all.

Bicchieri

Bicchieri states that game theory often is inconsistent with human behavior, for example in the Prisoner's Dilemma and the chain store paradox. The game theoretic solution relies on the players' knowledge of the theory of the game.

For the backward induction solution to obtain, players must have some knowledge of the theory's assumptions, but cannot have any common knowledge of them. In other words, Bicchieri's conclusion is "common knowledge of the theory of the game makes the theory inconsistent".

At every point in the game, the theory a player uses has to be free from contradictions, else behavior cannot be predicted. Since the player facing a contradiction cannot make a rational decision, the other player cannot predict his move, and so the entire theory becomes useless. Once common knowledge of beliefs is assumed, a player can try to manipulate future choices by his choice at the current vertex, or rather in this way manipulate the knowledge of beliefs by communicating his own beliefs.

Bicchieri looks at the assumptions that are needed for backward induction:

* In a game with k+1 levels, k:th level knowledge of his own respective strategies and payoffs is needed by every player.

* It is rational to maximize utility, and the players are all rational.

* Players have iterated beliefs of degree k of each other's rationality.

A kind of consistency is also required: a player cannot believe in p at a vertex v, if being at v contradicts p. So if P1 believes that P2 is rational, P1 expects P2 to choose rationally. Thus if a vertex is reached by a choice from P2 that is irrational, P1 can no longer think that P2 is rational.

Bicchieri says that the more the players know about each other's beliefs, the better they will be to deduce how the other one is thinking. If one assumes that there is a common knowledge of beliefs, then everyone will know that everyone else considers themselves to be rational. According to game theory, it is enough for one to think that he is rational, for him to be

Page 3: Backward Induction

rational. Bicchieri argues that this need not always apply however, as one can believe something without knowing it.

Now if everyone knows everyone else's expected utility, and P1 makes an irrational choice, should P2 deduce that P1 is irrational, or that P1 is trying to trick him? There is no way for P1 to deduce how P2 will think in this situation. This is considered strategic use of common knowledge. Now since choosing to deviate from the equilibrium path cannot be distinguished from irrational behavior, there are more possible solutions to a game than the one given by backward induction.

Now Bicchieri goes on to ask if the players actually want common knowledge of beliefs. She introduces an action which is "communicate a belief". Though saying "p" is not enough for P1 to make it common knowledge, there also has to exist a set of consistent beliefs that supports p which the players may have. This is however not the only possible set of beliefs of what P1 may say.

In Prisoner's Dilemma for example, a player may communicate that he is playing tit-for-tat rather than going for the backward induction solution, by cooperating in a round. By allowing such communication of beliefs, there are alternate solutions to games that also can be considered rational. Such solutions may end up having a higher expected utility for both players compared to the one reached by backward induction.

Pettit

Pettit begins with saying that for the backward induction argument to be sound in Prisoner's Dilemma, a rational player will always defect. Still both players would reach a higher payoff if they cooperated, so this should be in their best interests. If Prisoner's Dilemma is only played once, then there is no way to enforce cooperation. In repeated games however, it is intuitive according to Pettit that tit-for-tat has a higher payoff than always defecting, since cooperating in a round signals willingness cooperate to the other player.

Pettit offers a solution to the paradox by stating that at the beginning of the game, neither player is in a position to run the backward induction arguments. He states the arguments as follows:

Argument 1: At node n my partner will defect since he is rational, therefore I should also defect.

Argument 2: At node n-1 my partner acts rationally, and he expects me to act rationally in the following round and thus defect, therefore he will defect in round n-1.These arguments continue until n is reached. Now the mistake according to Pettit is to assume that a player can run these arguments before the game has started. Both players may believe that the other player believes he is rational, but these beliefs are not guaranteed to hold throughout the entire game.

Indeed if either player cooperates in the first round, then this common belief of rationality (CBR) will fail, since the other player cannot assume that the one that cooperated is rational

Page 4: Backward Induction

anymore. So neither player can believe that CBR will endure, and thus none of them can run the backward induction arguments at the start.

One can see that argument 1 fails if P1 cooperates in round n, since P2 can no longer consider P1 rational, and so CBR does not endure.

Similarly, were P1 to cooperate in round n-1, it would only be rational if he believed P2 would cooperate in round n, but this is irrational for P2 to do, so argument 2 fails.

It follows that neither player can believe that CBR survives if either of them cooperates at any time.

So can cooperation in the first round be rational? Indeed if P1 cooperates he thereby communicates to P2 that he may be playing tit-for-tat, thus making P2 likely to also cooperate in the next round. Now if P2 believes that P1 is playing tit-for-tat, he will then naturally play tit-for-tat -1 (defecting in the last round). Now if P1 believes that P2 believes he is playing tit-for-tat -1, he will then play tit-for-tat -2. Thus there are believes that P1 may have, that makes playing cooperation in the first round rational.

Indeed since CBR no longer applies, P1 can rationally believe that tit-for-tat -2, -4, -6, et cetera are best response strategies to what P2 will play. So defecting is not uniquely rational in the first round, but then is cooperating? No, since if so, the player that cooperated in round 1 would expect the other player to defect in round 2 in response to this, so he would be rational if defecting in round 1.

Pettit also says that no player can outguess the other by playing tit-for-tat - x for example. For say that P1 plays tit-for-tat -2 believing that P2 will play tit-for-tat -1 if he initially cooperates, how is it that P2 cannot follow this reasoning from P1 and predict this? P1 has no reason to believe that P2 will be able to follow his reasoning, since P1 believes that P2 believes that P1 is not rational, since P1 cooperated in round 1.

Pettit finally states that CKR does imply backward induction, since it is rational to believe that CBR will endure, because every player knows that CKR will endure. He also thinks that the paradox itself fails with CKR, since it does not allow for intuitive ways of thinking, which are natural. With CKR players have "no choice", and thus he thinks it should be replaced with CBR, since the latter allows for strategic thinking.

Comparisons and own thoughts

That CKR leads to backward induction seems to be agreed upon, especially Aumann finds this intuitive, even though he admits it is arguable whether or not CKR is desirable to begin with. He also states that it is an ideal condition which is rarely met in practice. Pettit thinks that with CKR there is no strategic thinking left to be done, since there really isn't any "choice" to be made at any point anymore. Hence Pettit thinks CKR should be replayed with CBR, as this actually allows for other solutions, which can be proven to be rational under CBR. Bicchieri mostly agrees with Pettit, and argues about how one can through actions communicate beliefs. Pettit shows that CBR will not endure if an action is taken that strays from the path of the backward induction solution, Bicchieri sees this as manipulation of

Page 5: Backward Induction

knowledge. As for me, before reading these papers I certainly agreed with Aumann mostly. Now I think Pettit actually has a nice "solution" for the paradox, namely that the assumption of CKR itself is at fault, since it disallows any strategic thinking in a sense. Indeed we are free to choose to play in ways that stray from the backward induction path, that now have become intuitive to me.

Continuing on, if one were to reach a vertex that is off the path of the backward induction solution, Aumann has no real results, although his proof that CKR implies backward induction at least demands that a player has a set strategy on every node. Indeed Aumann says very little about what happens at nodes that are not reached in his solution model, and his idea of ex post rationality implying ex ante rationality could also be examined further. Once CKR is not assumed, optimal choices at future nodes may certainly change depending on previous choices of the other player. So even if a choice seems irrational at the start of the game, it may very well become rational given a rational set of beliefs that a player may have once that choice is to be made. Here Pettit and Bicchieri both reason more about how this affects the view of the game from the players' perspective, how the players can (rationally) try to manipulate one another to deviate from the backward induction strategy.

Consider for example the Centipede game, where the backward induction solution is for the first player to choose stop immediately. Now depending on how long the game is, and by how much the payoffs increase in the later stages, the choice to stop at the first node seems often in itself almost irrational. Indeed if CKR is replaced by CBR, which I would argue is very reasonable in the case of longer games, continuing to let player 2 make a choice is going to be better against almost anyone in my mind. We kill CBR according to Pettit by continuing, and in the same manner as he reasons that tit-for-tat -x are viable strategies for rational players given rational beliefs in the Prisoner's Dilemma type games, continuing to a certain node x here before playing stop can be similarly rational. Basically, since player 2 cannot know if player 1 is rational anymore, since player 1 continued, player 2 cannot make any assumptions about player 1's behavior at nodes further down the tree, and thus player 2's response cannot be predicted by player 1 either. Bicchieri would see this as strategic use of common knowledge. To me this is appealing, as the game becomes much more complex, and indeed allows for strategic "outside the box" type of thinking.

Taking it a bit further, one can categorize players into different types depending on which strategies they are likely to play. Indeed if a player of type A always plays the hawk strategy regardless, he can in these types of games be viewed as some kind of robot rational player. Now players of type B may be the tit-for-tat -x type, where different x are chosen with different probabilities. Then players of type C may sometimes mimic type A and sometimes type B, et cetera. Regardless of how one views the entire population of players, if one does not beforehand know which type of player that one is playing, one can still have a rough (educated or not) guess about the likelihood of facing a certain strategy.

Now I would argue that the likelihood of facing a player of type A is almost always sufficiently small, that an initially cooperative strategy in both Centipede and Prisoner's Dilemma will have a higher expected utility on average. Even if we somehow fail to outguess our opponent, this will in general happen so far down the tree that the utility we have gained along the way by far outweighs the utility we would have gotten by playing as type A. Of course, our opponent then has higher utility in the end than we do, but we should only be concerned with our own relative increase in utility compared to other strategies! The fact that our opponent gained more utility than we did, does not make the utility we gained worth less

Page 6: Backward Induction

(since such considerations should be part of the utility of the game at each node from the start of the game, if any). Basically, since the games are not zero-sum, the losses in utility that we suffer should we not play A versus type A, are typically small compared to the gains we may have. Indeed if someone played hawk for the first few rounds regardless, the probability of him being a type A player would be high, and thus our response would be to turn into type A ourselves.

This is in itself interesting, as one can see how a belief is communicated to us in the same way. As soon as the other player has made his first move, the relative probabilities of him playing a certain strategy will change for us, and so may our best response strategy. This is of course true for our opponent as well, and we should have some idea of how he might view our choices as well, as to not be accidentally predictable and exploited. Based on our beliefs about our opponents set of strategies, and the probability of each strategy, we should ourselves have a set of strategies that we play with different probabilities. To see this, if we don't, then after a certain number of moves our opponent may be fairly sure that given that he has played in a certain way, our strategy is a typical best response to his perceived strategy (which he has intentionally communicated), and since he knows this, we are bound for exploitation. Unless it costs our opponent too much utility to manipulate us to a set strategy to begin with, it would only be rational for him to try to do this.

So at no point in time (except possibly the very last node, but even then it may be interesting to randomize cooperation, given that our opponent somehow knows we may do this), can we allow ourselves to be too predictable. The number of possible different strategies, the number of nodes, the payoffs, and possibly other factors all attribute to how two rational players would play each other. What we need is a meta strategy, that is a set of strategies that we play with different probabilities that in turn depend on our opponent's previous actions. So how far can one take this? How much out-guessing and probability adjustments will happen in these cases? Since these games are long enough that the backward induction solution is rarely optimal, they are also complex enough that such strategy sets of n-level back and forth guessing with probability adjustments are really hard to "solve". And even if we do find some mixed strategy of strategies Nash equilibrium, there is no guarantee that this will be the best response strategy to what our opponent is actually playing. I do however hold it probable that there is a mixed strategy Nash equilibrium (of strategies) that two perfect robots with a set of (dynamic) beliefs and a set of possible strategies, which would maximize their expected utility in these types of games, and where both almost always would be better off than the type A robot.

Another example of strategic use of knowledge is the sacrificing of a piece (meaning something stronger than a pawn) in Chess. Since your opponent assumes that you are rational, if you suddenly seem to give up a piece, it is not without hesitation that he will take it (assuming of course he also is rational). There are indeed cases in Chess where to optimal play is to give up a piece for a better overall position, and once you make such a move, your opponent is likely to analyze several nodes down the tree before deciding whether or not to take it.

In games of imperfect information, like Texas Hold'em (a popular variant of poker), one can play around with destroying CBR and using common knowledge as strategic advantage even more. It is not that hard to calculate Nash equilibria for certain strategies preflop (when everyone only has two cards in their hand), and it is a zero-sum game with mixed strategies almost always being optimal. It gets more complex as community cards come into play, but

Page 7: Backward Induction

there are still game theory optimal strategies. Imagine you do something perceived as really crazy only once, like a big bluff with a bad hand or whatever, your opponents are likely to remember this for a long time. Now you know this, so you start firing away big with good hands instead, and are more likely to get paid since you are known to be this crazy bluffer. Of course, since we don't have perfect information, this is a bit of a sidetrack as some of the solution concepts are not as applicable.

So do we want CKR? Aumann questions this and make no recommendations, whereas Pettit clearly thinks of it as a flawed criterion when looking for solutions to games, and Bicchieri is seemingly on the same track. If we consider the rather recently solved game Checkers, how fun would it be to watch two robot-like players playing each other perfectly and always drawing? The game itself seems to "die" in a way, since it doesn't seem like there is anything to play for anymore, and at some point in the future it might possibly only be used to illustrate game theory by serving as an example. It is also hard to apply CBR to this game now, since if one player deviates from the equilibrium strategy, the other player will only also deviate if he knows he is more likely to win this way, making any kind of communication of actions (or manipulation of knowledge) useless. After all, if you have a strategy that is at worst a draw, why would you ever risk playing in a way that may lose the game? Basically, since the game is solved, you would already know the perfect strategy against any play, not just against the equilibrium strategy.

What would happen if for example Chess were to be solved in the same way? (It is however argued that due to physical limitations it may be impossible to create computers that are fast enough for this). Part of the beauty of such games is the possibility of tricking your opponents, of out-thinking them in some way; this would all be gone in an instant. So again perfect players would always play a draw against each other (or maybe white does have a decisive advantage by starting), and it would not be fun to play nor watch. Indeed I agree that CKR is not something we generally want, neither in semi cooperative games like Prisoner's Dilemma nor in zero-sum games like Chess. It is even easier to see this if you consider how you would play Tic-tac-toe against pretty much anyone rational, you would always draw right? Then it can be easily argued that there is no point in playing Tic-tac-toe, since you both know beforehand that neither of you is going to win. There simply is no fun left to games if nothing is left unknown, be it by roll of dice or by incomplete knowledge. My conclusion is that we certainly do not want CKR for entertainment, and rarely want it for maximizing our own utility of any game. It is of course possible (and easy) to create games where CKR maximizes both players' utility, especially if the game is cooperative.

Is it always rational to maximize utility then? All three papers seem to agree on this at least, and I see no reason not to think so. Say you would choose a worse outcome (or rather action leading to this outcome) for yourself, so that your friend would be better off instead. This should already be reflected in the utility of these outcomes, giving the altruistic choice a higher score, thereby making it the more rational choice also. Utility is in this way a really good way of modeling rationality, since it's so flexible and can account for anything you want it to. In a sense, to me it is rational to think of utility as a good way of modeling rational choice.

By: Rickard ForsMarch 2012