In this video, we're going to look at the equilibria of infinitely repeated games. So in order to talk about equilibria, we need to begin by asking what are we going to mean by a pure strategy in an infinitely repeated game? And you may like at this point to pause the video and see if you can answer that question before I go on and tell you the answer. If you need a hint, you should remember that the rule of thumb that we've used to define strategies as we've changed game representations has been to say that a pure strategy should be whatever you need to tell another person to play in the game on your behalf and to end up doing exactly what you would have done. So a pure strategy has always kind of been the policy that you would, you would follow. Making all of the decisions in, in the same way as you would have actually done it. So whatever you would need to communicate about that policy. Well, in an infinitely repeated game your strategy is going to be a choice of action at every decision point that you would have. which means an action that you would take every stage game. And bear in mind, that when you take those actions, you actually get to reason about everything that you've seen before in the game which is to say, you can remember all of your own previous actions and you can also remember all of the actions in previous stage games by other players. So, so really, your pure strategy space is mapping from every history to an action choice that you would make. Clearly, this is an infinite number of actions. So, unlike the previous games we've looked at, extensive form games and normal form games, you're not even going to have a finite set of pure strategies in an infinitely-repeated game. Let me give you some examples of some famous pure strategies in infinitely-repeated games, nevertheless, to give you the sense that we can say interesting things about pure strategies, even though the set of possible pure strategies is. Is infinite. So think about, playing repeated prisoner's dilemma, so , I'm going to play the prisoner's dilemma game infinitely. One famous strategy is called tit-for-tat. Turns out that they were famous competitions where people submitted programs that played a prisoner's dilemma very repeatedly and then they looked at how these programs did against each other. And, tit-for-tat was a stradegy that did famously well in those competitions. The way it works is it starts out cooperating, and then if it observes that it's opponent defect every defects in the previous round the opponent defected, then tit-for-tat defects in the next round then it goes back to cooperation. So then, if it defected but its opponent cooperated, then it'll respond to that by cooperating. So if tit-for-tat plays itself it'll cooperate forever., but if it plays against somebody who defects it's going to intuitively kind of punish that defection by defecting itself in the next round. But it's very forgiving, it's going to go back to cooperation after one punishment. In contrast, the trigger strategy is kind of really a mean-spirited version of tit-for-tat. So, it starts out by cooperating, but if its opponent ever defects, that's it. It's just going to defect forever. So it's going to pull a trigger and say., I'm, I'm never going to forgive you. You've wronged me once. I'm going to wrong you back until the end of time. That's why we call it trigger. So, you can see that, in both of these cases, I've been able to describe to you how these strategies would respond to anything from the infinite history that they would see kind of in algorithmic terms without actually kind of writing out a strategy kind of in a formal language. So, you can see that it is possible to think kind of coherently about interesting strategies in infinitely repeated games. Now, of course, what we would really like to do as game theorists is describe the Nash equilibria of infinitely repeated games. The kind of approach that we've taken in the past, has been to first of all show how we can make an induced normal format of a game. We, we figure out what a strategy is in the game, we show how to make the induced normal form. And then, we just appeal to Nash's theorem, and say, because we've got a normal form game, we know that there is a Nash equilibrium, and so, everything kind of goes through the way it did before. Well, that worked well for us in the past, because we always ended up with finite sized induced normal forms. And unfortunately, because we have an infinite number of pure-strategies, we're going to get an infinity by infinity matrix, even in the two player case. And so, we're not going to have something to which Nash's theorem applies anymore, because now we have an infinite-sized game and Nash's theorem only works for finite games, which means games whose matrices are finite and contain a finite number of numbers. And that means we, we don't have any reason based on what we know so far, even to be sure that equilibria do exists in these games. on the other hand, because there are an infinite number of strategies, there might even be an infinite number of pure-strategy equilibria. So we've seen the, the fact that in the past it's possible to have an infinite number of mixed strategy equilibria. For example if I, in a normal forum game, have two strategies. two pure strategies that I'm indifferent between, then any mixture between them can also be a best response for me. So that there's a sense in which I can have a infinite number of mixed strategy equilibria. But here, I can have an infinite number of qualitatively different pure-strategy equilibria potentially, that seems like a problem. So, we're not going to be able to somehow list off the equilibria, necessarily, of a repeated game. Interestingly, there is still something that we can coherently do to give a satisfying answer about the Nash equilibria of infinitely repeated games. And in this video, I'm going to tell you what it is, and actually because, it's so satisfying I'm even going to prove this theorem to you, so you really will understand how this one works. And, here is the idea of the theorem that we will eventually prove, we can characterize what payoffs can be achieved under equilibrium. And, we can give an example of equilibrium strategies that result in those payoffs, without giving every equilibrium that results in those payoffs. So, we're going to kind of characterize which strategies are in equilibrium in infinitely repeateded games, sort of via their payoffs. I'm going to be able to tell you precisely which payoff vectors are achievable in equilibrium. So to do that, I need to give you a little bit of notation so we can talk about these payoff vectors. So this is kind of our slide of notation, once we get through this we've going to have all the building blocks we need to prove our theorem. So, we're going to start with some n player game so this is just our stage game, some normal form game. And we're also going to start with some payoff vector here, and by this, what I mean is the utilities that the players get in the game. Now, in this video I'm going to talk about the average reward case, where each of these numbers actually encodes the utility that I get on average by following my strategy in the infinitely repeated game. And so, that's going to say, I care just in the same way about payoffs that I get now and payoffs that I get a million iterations into the future. we can do something similar in the discounted reward case and that's going to be a different topic of a different video. But for this video, we're going to talk only about the average reward case and the reason is the proof is a little bit easier to think about in the average reward case. Even though the discounted reward case, reward case is maybe a bit more of a practical setting. We can get across the, the key ideas of both proofs by this proof and this one's a bit easier. So, so let's do it to do that, I need to remind you of the concept of the minmax value here which is something that we kind of introduced in the concept of zero games, but it turns out to be very important for repeated games. So I'm going to remind you of what it is here. So, this is the mathematical definition of the minmax value, but let, let me start by saying in words what it means, because the, the nested min and max operators are a little bit hard to think about. Essentially what, what i's minmax value is, is it's the amount of utility that player i is able to obtain for himself. If all of the other players who we call minus i are completely unconcerned about their own utility, and instead, they're just trying to hurt i as much as possible. So, they all play that strategy from their joint mixed strategy space, that, if i responds to that by doing the best thing that i can for himself, then i's utility is as low as possible. So they're trying to minimize i's utility, given that i is trying to maximize in response to their minimization and the number that comes out at the end is the amount of utility that i can get by doing as, as best he can against everybody trying to hurt him as much as possible. So that's i's minmax value. So, intuitively, if i want to punish you as much as possible in a game, and you know that I'm trying to punish you, that's, the, the amount of utility that you get is your minmax value. So I will say that a payoff profile, remember, a payoff profile is a payoff amount for everybody, is enforceable, I'll give it the name enforceable if it's the case that everybody's payoff in that payoff profile is at least their minmax value. So if anybody is getting less than their minmax value in a given payoff profile, I will say that that's an unenforceable payoff profile. And I will say that a payoff profile is feasible, I'll use, I'll use this technical term feasible if the following condition is true. intuitively, what I, what I want to say about feasibility is that it's possible to actually get that payoff profile by combining together payoffs in, in the actual game that's being played. Notice that enforcibility doesn't actually require that it's possible. You know, I could say the payoff $1 million for me, a million for you in prisoner's dilemma is an enforceable payoff profile, because it's above both of our min max values, but there's no way we can actually each get a million in prisoner's dilemma, because the numbers don't go that high. So feasibility is going to talk about what it's actually possible to do. And the way we're going to say this is I'm going to restrict myself to rational numbers, I'm going to say, if there exists rational and non-negative values alpha a, such that for all players, I could express player i's element from this payoff vector as a sum overall of the payoff profile, action profiles in the normal form game of this alpha a times i's utility for that a. So, intuitively oh, and then, of course, I need this let me just say this last part of the condition, that these alphas all sum to one, across all of the action profiles. So let me kind of explain to you what this means. So, oops, so I have kind of a normal form game and what I want to say is I have some weight on each cell in the game. These are the alphpa a's and they all sum to one. And what I want to do is take my payoff from this cell weighted by the alpha for that cell, my payoff for this next cell weighted by my alpha for that cell, and sum all of those, those weighted utilities for me up and then I get ri. So I, I, what I want to say is there exist alphas that I could use that would blend together the payoffs in the actual game in such a way that I get ri. And furthermore, that has to simultaneously be true for everybody else, and in particular, it has to be true with the same alpha a's for everybody else. So I can come up with some alphas that may, that get me my ri, but then I have to use those same alphas to get rj for player j. I have to have the same weights on all of the, the cells here so that I, I get the same numbers for everybody. So, so let's think about the following game, which will give you an examp le of how feasibility works. So, let's say I have a game that looks like this. So in this game, I claim the payoff profile 1,1, is enforcable, and the reason is I can put a weight of 50% on this cell, a weight of 50 percent in this cell, and weights of zero on these cells. And you can see my payoff in this game is 50% times 2 plus 0 times 0 plus 0 plus times 0 plus 50% times 0, which is 1. The other players' payoff in this game is 50% times 0 plus 0 times 0 plus 0 times 0 plus 50% times 2, which is also 1. So that is a feasible payoff in this game. On the other hand, the payoff 2,2 is not feasible in this game, and the reason is, there's no way that I can have weights on these cells that sum up to one and that give both of us two, because for me to get a payoff of two, this would have to be one. That's the only cell where I get a payoff at all. And for my opponent to get two, this one would have to be one, and then they would both sum up to more than, than one. This condition over here would be violated and so we just can't do it, so that's not a feasible payout. ,, , Okay. Nol, we're ready to prove the folk theorem. So the, and it's kind of funny it's called the folk theorem. It's kind of like a folk song for mathematicians. So a folk song is a song that everybody knows, but nobody really knows who wrote it first. And a folk theorem is a theorem that people all kind of knew and talked about before they got around to writing it down and they're not really sure quite where it came from. So, this is the folk theorem of game theory, and despite having uncertain origins, it's, it's very important. So in the folk theorem basically tells us what the Nash equilibria are of an infinitely repeated game in terms of these payoffs and in terms of the definitions that I just told you. So there are different folk theorems for different settings. This is the folk theorem for infinitely repeated games with average rewards. So, the folk theorem has two parts which basically stems from the fact that I've made a restriction here to rational alphas. Turns out we don't actually need that, that the math gets more annoying if we have to deal with real values. So, to keep things simple, I'm doing this just for rational numbers. So, the folk theorem says that in any n-player game, which we're going to repeat infinitely, and, for any payoff vector r1 to rn just like we've been talking about. We're, it, first of all we're, we're going to talk about the case where if r is the payoff in a Nash equilibriuum of the infinitely repeated game with average rewards. What we can conclude about the payoff vector? And what we conclude is that for every player, r has to be enforceable. Remember, enforceable means greater than or equal to that player's minmax value. So I can conclude that if r is the payoff in an equilibrium of the infinitely repeated game with average rewards, then for everybody that r must be enforceable. That seems like kind of a weak thing to say. The second part of the folk theorem is kind of more surprising. It says, basically, that's the only restriction we need if r is enforceable, and furthermore, if it's feasible. If it's kind of possible to make it out of the payoffs of the game, then, r is the payoff in some Nash equilibrium. So, together, what this says is that, basically, feasibility and enforceability is the whole story of Nash equilibrium. Enforceability seems like this very small thing. It says you're getting no less than the payoff that you get if everyone punishes you as much as it's possible for them to punish you. And, feasibility says it's kind of possible to get these payoffs in the game at all. And the folk theorem says. That's basically it, as long as you meet those two conditions, you've got a Nash equilibrium. That's, that's basically what there is to say about Nash equilibrium. You'll notice there's, there's a little bit of wiggle room between these two statements these two parts of the theorem statement which has to do with the fact that we're talking about feasibil ity here and we're not talking about it here. So we can't conclude that every Nash equilibrium is feasible. You might wonder why that is, that's just because some of them aren't expressible as rationals. Some of them, the alphas wouldn't be rational. So that's the part that I'm kind of leaving out of this proof, but that, that's a technical point. So the broad thing that you should remember about the folk theorem is that, basically, feasibility and enforceability together are equivalent to Nash equilibrium in repeated games. So if, if you wanted to stop here, you now know what the folk theorem is, but I encourage you to keep watching the rest of the video, and you'll see how we actually prove this there. So. Let's begin by saying proving the first part. So saying why is it that if the payoff is achievable in a Nash equilibrium that I can conclude that it must be enforceable? Well to prove this, I'm going to prove by contradiction. So I'm going to begin by supposing the payoff vector is not enforceable. And if not, then that means there must be some player i for whom his payoff, ri, is strictly less than his min max value. Now, let me consider a situation in which this player I changes his strategy. Let's imagine that he, instead, deviates to playing bi of s minus ih. I'll tell you in a second what that means. any time he's in a history h of the repeated game. So, what is this thing? It says, this is the strategy of the other players. Let's assume that he gets to know what that is, because we're talking about a Nash equilibrium. And, and we'll say, bi is just his best response, every time the other players are playing their strategy profile s minus i given h for every history h So, let's just consider a case where he just best responds to what the other players are doing. Well, by the definition of a minmax strategy, you have to get a payoff of at least your minmix value in every stage game if you follow the strategy that we just defined. Because, remember, the definition of a minmax v alue is that you get is, is that everyone is trying to hurt you as much as possible, and then you're best responding to that. So intuitively, if I'm already getting less than my minmax valu, that means I can't be best responding to everybody else, because if they're trying to hurt me as much as possible, which is the worst case for me, and I'm best responding to them, I will get my minmax value. So intuitively I could change my strategy to best responding and that has to improve things for me, that would have to bring me up to by minmax value. And because that would improve my utility, that's a better response for me that what I was doing before. And because so basically, here we've just kind of constructed a profitable deviation for player i. And because a profitable deviation exists for him, that means the strategy he was playing before couldn't possibly have been a Nash equilbrium. And so, that leads us to, to derive a contradiction from our that we made at the beginning that r was not enforceable. So you can see we can conclude that if r is not enforceable, we, we can't have a Nash equilibrium and, and that then proves what we want it to prove, that being in an Nash equilibrium implies enforceability. So that's part one, that's kind of the easy direction and the less interesting direction. Let's now do the, the second and more interesting part, which says all I need to assume is that the payoff profile is both feasible and enforceable and that means that I've found a Nash equilibrium. And what's interesting about this part is that we're going to do it by construction. So I'm going to show you how If you're given any feasible and enforf, and enforceable payoff profile, you can build a set of strategies for both players which are in equilibrium with respect to each other, and which obtain exactly that average payoff for both players. So, first of all, let's just kind of introduce some bookkeeping notation that we'll use here. So, since r is feasible, and since we've assumed that these alphas a re rational, then we can write each r-i as follows. So, basically we can make up new variables, beta a and gamma, where we've replaced each alpha a by beta a divided by gamma, and, basically, that, that's not hard to do. Because we know that the alphas are rational. So, we know that we, it's possible to write each of these alphas as a fraction. and then we can make gamma into their common denominator, rnd we can set the betas appropriately. And then, it's possible to come up with betas and gammas that make this expression true. So, so, that shouldn't be too surprising, that just follows from, from the fact that these alphas are rational and from feasibility. Now, I'm going to construct a strategy profile, and then, on the next, slide I'm going to argue to you that, that strategy profile is in equilibrium or that, that we can turn it into equilibrium. So, let's make a strategy profile that cycles through all of the outcomes using cycles of length gamma. And the way it's going to work is that each, each cycle is going to repeat action a exactly beta a times. So we're going to have our kind of game matrix here. And, remember, beta and gamma are both integers, so, so for example, let's say here we have, let's say we have gamma is seven and say here we have beta as two. So here, alpha a is 2 over 7, let's say here it's zero. over 7, 0 over 7, I'll just write the betas from this point on. 0,1,2,0,2,0. So, so let's say that's, that's what we had in this particular game, then what the strategy would do is it's just going to cycle through. So player 1's strategy would be to play, let's, let's give these names, A, B, C, D, E, F. So player one's strategy would be to play A, A, B, B, B, C, C, and then go back, and keep doing that forever. And player 2's strategy would be to play D, D., E, F, F, E, E, and then go back. And so, if the 2 players played those two sets of strategies together in a coordinated way forever, they would cycle through exactly hitting every action profile in this game accord ing to the betas in, in just the way that the betas say. And, and let's denote such a sequence a to the t. Now let me make strategy the, the real strategy is for the players that I'm going to claim are in equilibrium here. Let's define a strategy si of player i to be a trigger version of this strategy. So, if nobody deviates, then si tells the player to play just what at would tell them to play. So, the players are going to begin by kind of cycling through these outcomes in just the way that I told you. But, if one player notices that the other player didn't do what they were supposed to do according to at, then they're going to hit the trigger and instead from that point on they're going to play the minmax value against the other player. So if, and sorry that's, that's just what this says here So if there's ever a period in which somebody does the wrong thing, then everybody is going to gang up on that person forever, and play the minmax p-, p-, play the strategy against that person that causes that person to get their minmax value. So. So, so let me just say that one more time to make sure we've all got it. So, sir 8t is a sequence where, which is constructed so as to hit every action profile here exact, exactly the number of times given by these beta integers and we'd cycle through that forever. And the strategy we're interested in is a trigger version of that, that says everyone tries to do that, but if somebody does the wrong thing Everybody punishes them forever and gives them their minmax value. Okay. So, I want to claim that this is a Nash equilibrium of the infinitely repeated game with average rewards. So first, notice that if everyone does play according to this strategy, then everyone will get an average payoff of ri just as we wanted. now you might be thrown off by the fact that, that, that sometimes they're not quite getting r-i because, they're only half-way through the sequence. But, we're only interested in the limit. And, so, if you look at averages over periods of time of length gamma. So, you look at what happens after gamma, what happens after two gamma, what happens after three gamma. You, you'll be able to convince yourself that they really do get the right payoffs. Because at, indeed after every period of length gamma, they get exactly the payoff ri by construction. so, so indeed this does lead to the payoff they were trying to get. What, what remains to show is that this is a Nash equilibrium that nobody can gain by deviating. And indeed I claim that it is. So, to show that let's, let's imagine that everybody else is playing according to this strategy and some player J deviates at some point. Well, if that happens, then for all time after that point, player j is going to get his minmax payoff, and, that's going to render the deviation unprofitable. Because we've assumed that this payoff profile is feasible and enforceable and that means that he was already getting at least his minmax value. And because this is going to happen for all time, right? This is going to happen forever afterwards, that's just going to end up dominating the average. Everything that's happened for that finite amount of time beforehand is going to be washed out of the average. And instead, his average payoff is going to turn out to be his, his min max payoff. And by our, But by the, the thing we're trying to prove here, we know that that's less than or equal to rj. And so, there's no reason that he can gain by such a deviation. And, that suffices to prove the folk theorem. So, so what we've seen now is that essentially, feasibility and enforceability. correspond to, the payoff profiles that are achievable in Nash equilibrium of an infinitely repeated game with average reward.