Folks, welcome back. this is Matt again. And I'm going to talk a little bit about now about repeated games when players discount the future payoffs. And let's talk a little bit about more, more about that and what that means. So when we're looking at discounted repeated games, the idea is that we're, we're looking at games where there are players playing the same game over and over and over again. But instead of looking at the limit of the means in terms of some limit of the average of, of what the payoffs are going to be in the distant future, instead people are, are looking at value in today versus tomorrow differently. So, the idea of discounted repeated games is the future's uncertain. You're often motivated somewhat by what happens today. and you trade off today versus the future. So, it's not the infinite future that you care about, but you say, I really care about today. I care about it a little bit more than tommorow. So maybe tomorrow's value is say 80 or 90% of what today's value is. And that means that the next day is worth, if I say today's worth one, tomorrow's worth 0.9, the next day's worth 0.81 0.72, etc. So, things are, are decaying exponentially in terms of discounting. And, so the idea here is, is if I misbehave today, now I have to think about what are the, how are people going to react to that? So, if we're trying to support cooperative behavior in a prisoner's dilemma, I can behave today, or I can cheat and deviate, defect. And if I do that I'm going to get a temporary gain, and then I'm going to possibly be punished in the future. So, the kinds of questions that are going to be important here is, will people want to punish me in the future? Is it going to be in their interest? how much do I care? Do I care about it? what's my discount? Do I care a lot about the future or just a little bit? so we're looking at a stage game. So again, a stage game just take a normal form game, we're going to play that repeatedly over time. And now each player has the discount factor. So , Player 1 has a discount factor and so forth. Discount factor is going to be taken to B and 0,1. Generally, we'll take beta i to be strictly less than one so that it's of more interest. If it's equal to zero, then it means that you don't care about the future at all. It's basically just a one stage game. So generally, the interesting case is going to be when players care somewhat about the future, but they care more about today than tomorrow and so forth. often in these games, people look at the situations with a common discount factor so everybody has the same discount factor which will make things fairly easier in some cases. And then, the idea of discounting is then the path that you get from a whole sequence of actions. So profile of actions, a one played in the first period and, and at in the t-th period, and so forth. What you just do is you sum up these payoffs, but now you weight them by an exponentially decreasing function which is the discount vector raised to the power of t. So, if I care, you know, I, I get if, if this payoff was one every day, I'd be getting 1 today plus 0.9 plus 0.81 plus 0.72, etc., right? So that's the idea. Okay. So when we look at these games again players can condition their play on past history. So history, a finite history of, of some length t is just going to be a list of everything that's happened at every date. so here, a1 is equal to a profile of what every player did in period 1. So in the first time we played this game, what did everyone do? And generally, at is going to be what everybody did at time t, right? So, we've got at1 to atn. so these things are vectors, and they tell us what everybody did in the first period, what everybody did in the second period, and so forth. And then, we can talk about all finite histories. So all possible histories that I could be faced with when I am playing this game, all the kinds of things I'm going to have to think about. What am I going to do if this happens? What am I going to do if that happens? So in an infinitely repeated game, I've got all these histories. What I'm going to do in each circumstance? So a strategy is a map from every possible history into a possibly mixed strategy, over what I can do in the, in the given period facing the giving history. So, if we're looking at a prisoner's dilemma, people can either cooperate or defect in a, in a given period. So, if we're thinking about a history of a given length 3, one possibility would be the following. We both cooperated in the first period. Maybe Player 2 defected in the second period. And then, both of them defected in the third period. So that would be a possible history. And then, they could say, okay. Now what are we going to do in the, in the fourth period. maybe we'll let bygones be bygones and try and get back to cooperation. Maybe we'll just defect, we're angry at each other, who knows. Okay. So, for strategy for fourth period would be what, what you do after you've seen different histories of the 1st 3 periods. So, sub-game perfection again is same as usual. Profile strategies that are Nash in every subgame. What's a subgame here? subgames are just starting some period and talk about what remains. So, it has to be a Nash equilibrium following every possible history. So, if you take some history, start at that point, it has to be Nash for forever on. So strategies now are going to be specifications of what we would do in every situation. And then, we've got Nash in every history. one thing to check and it's important here is repeatedly playing a Nash equilibrium of the stage game. so just find a static Nash equilibrium of whatever game it is. So, for instance, defect, defect in the prisoner's dilemma. Just play that forever, no matter what's happened in the past, it's always going to be subgame perfect. So, for every possible history, everybody's going to say that they're going to play the Nash equilibrium forever on and going forward. you can check that that's subgame perfect equilibrium, right? That's going t o be Nash in every possible subgame. So check if everyone else is doing that, I wouldn't want to deviate. So, just think a little bit about the logic of that. Because it's, there's a lot of possible subgames to think about but you can convince yourself that that's true. Okay. So, so solving the repeated prisoner's dilemma, let's think about it a little bit with the context of discounting now. So, let's suppose that what we want to do is we want us to stay in cooperation, right? So we've got our standard prisoner's dilemma. I put in payoffs here of 3,3 for cooperating, 5,0 from you defecting, and the other person cooperating, and 1,1 if you both defect. So, the only Nash equilibrium of the static game is defect, defect with payoff 1. We want to support 3, 3 if we can. So, cooperate as long as everyone has in the past, and a, defect forever in the future if anyone deviates. So, when is this in equilibrium? Okay? So clearly, that's not in equilibrium. So if we set beta i equals 0, for both players, we can't make this work, right? Because I don't care about the future, nobody cares about the future. then we'd end up with so many defect, defect in every period being the only subgame perfect equilibrium. Players only care about the present they're always just going to miopicly defect. They don't care about the future so nothing's going to work. So, the question here is, for which betas can we sustain this kind of strategy, which is cooperate as long as everyone has? And if we ever, if cooperation breaks down, then we just say, forget it. We're going to defect forever after. Okay. Let's have a peek. So if you cooperate and the other players cooperating. If no one's failed to cooperate in the past, what do we get? We get 3 in perpetuity, right? So we get 3 plus beta times 3, so take a common discount factor for now. beta squared times 3, beta cubed in the third period, and so forth. So, in perpetuity, if you remember your sum of series, that's just, the value of that is just 3 over 1 minu s beta. Okay. What happens if I defect? And people were playing this, this grim trigger strategy. Well, everybody else is cooperating. The other person's cooperating in the first period. So, I'm going to manage to change from cooperate to defect. I'm going to get a 5 in the first period. But then, they're going to see that. And the next period, they react to it. They defect, and they say they're, that everybody's going to defect forever after. So then, in perpetuity we get a bunch of ones, right? So, what do we get? We get 5, and then beta times 1, beta squared, and so forth. And if you remember your sum of series here, this is just beta times a 1 plus a beta 1, and so forth. This is beta times 1 over 1 minus beta. So, if I deviate, what happens is, in the first period, I get a gain, but then I lose in the subsequent periods. So, there's a trade off. And how big that trade off is depends on the size of the discount factor. So we've got these two different payoffs. We can look at the difference between these. if I stay cooperating instead of defecting, I'm giving up 2 today. I could, I could gain by defecting. But then, I keep the benefits of cooperation in the future. So, I don't ruin things and that, that means I'm getting a bunch of 2's extras in the future. And so, when you look at this, the value of this is beta times 2 over 1 minus beta minus the 2 I'm foregoing today. And when do I want us to keep cooporating? As long as this is non-negative, right? If this becomes negative, then I'm worse off today by cooperating. I might as well just defect. So, difference is non-negative if this thing is such that beta is greater than 1 minus beta, or basically beta needs to be greater than or equal to a half. So, if you just go through the algebra of solving this inequality, you'll get beta greater than or equal to a half. So, as long as people care about tomorrow, at least half as much as today, they're going to be willing to cooperate in this, in the repeated prisoner's dilemma, with these particular payoffs that we looked at before. So, when we're looking at this, this payoff, this payoff structure here, then we've got a situation where beta has to be if each beta i is at least a half, then they can sustain cooperation in this finitely repeated, or infinitely repeated prisoner's dilemma. Okay. So, let's change the numbers a little bit and see what happens. So now, let's try and make defection a little bit more attractive, right? So, instead of 5, we'll make it worth 10 to defect. So now, defection looks really attractive. what, what has to happen? Well, we can go through the same exact calculations we just did, but we're just going to change the numbers, right? So, we've got the same if cooperating in perpetuity is worth the 3 over 1 minus beta. The only difference is, we're getting a higher number here, and then we're still going back to the defect. So there's a, a little bit more temptation today. And when you do the differences here, you know, you get the same kind of thing. Except now, instead of a minus 2, we've got a minus 7 difference. you're foregoing 7 units for not defecting today. So, when you go through and solve for that, now beta has to be at least 7 9ths before players are going to be willing to cooperate. So, you have to care about tomorrow at least 7 9ths as much as today, okay? And so, you can see the basic logic here, right? So there's tradeoffs of punishments tomorrow versus a good payoff on today. And that, the, the, whether or not something can hold together as an equilibrium, what's it going to be determined by? We have to know how big is the future versus the present. How tempting is the defection versus the current. the, the what, what we're doing in the current period. how big is the threat. So, how bad is it if, what, whatever the thing that we're resorting to in the future. How bad is that in terms of the trade-off. All these things are going to matter in terms of holding together cooperation in these kinds of settings. And that gets back to the discussion we had a little earlier about say, OPEC, right? There's a temptation to pump more oil today. how much do you care about the future? What's your beta? What's the reaction going to be? If I start pumping more oil, how are they going to react to that? Are they going to start pumping more oil and driving the price down? How much is that going to hurt me? All of those things matter, and they determine whether an equilibrium can hang together or not. Okay. So, basic logic play something with relatively high payoffs. Even if it's not an equilibrium of a static game, you can sustain it. and you sustain it by having punishments. If anyone deviates, you resort to something that has lower payoffs at least for that player. And the important thing is that it all has to be credible, has to be an equilibrium in the subgame that goes forward in order to make that work. And, it has to be that the lower payoffs in the future are enough to make sure that you, you know, you deter people from deviating in the present.