In this video, we're going to look at how to define utility in infinitely repeated games. So, remember that the way that an infinitely repeated game works is that we have some stage game, which is a normal form game, and the players repeatedly play the same game over and over again. And what that means is, each player gets a sequence of payoffs, let's say, Player I gets the sequence R1 in the first repetition, R2 in the second repetition, R3, just on and on infinitely. So, we have this infinite sequence of real values which are the payoffs that this player has gotten. But if we want to reason about this game, we can't really reason using utility theory about an infinite sequence. We, instead, have to take this sequence and turn it into a single number that talks about the utility that the player has for having played this sequence. And so, how do we do that? What's the right way of thinking about that? So, the first thing to notice is that previous things that we've learned about Game Theory aren't going to be sufficient to answer this question. So, you might wonder if we can take this infinitely repeated game and just write it in extension form. And, of course, we can't. And the reason is that the extensive form would be infinitely deep. We would never get to a leaf node where we could write a payoff. And so, that won't help us. You might also just wonder, can we sum up the sequence of payoffs and just say that my utility is the, the, the sum of these values. And the problem is, that, that sum can be unbounded. Because, if for example, every payoff I get is positive, then I'm going to have some unbounded amount of, of utility at the end. And so that, that's not going to work for me, I want to have finite utilities. So, instead there are two canonical ways that this gets defined. And I'll tell you about both of them in this video. So, here's the first one. so the first thing says, let me look intuitively at my average payoff over the sequence. Now, the average payoff of a sequence is als o not well defined because, of course, the way I take an average is I sum everything up, and then I divide it by the number of things. And we've already seen that the sum could be unbounded, the number of things is unbounded as well, so I would have infinity divided by infinity, which wouldn't help me out here. But, what I can do instead is to look at the limit of finite averages as the averages get longer and longer. So, what I can say is let me look at an average over, over the first k things in my sequence. And then, let me take the limit of this average as k goes to infinity. And, it turns out actually technically, this isn't always well defined. It's almost always well defined and there's an easy fix we can do to this definition that I've left out here to keep it from getting too technical. But, in cases where this is well defined this is the right thing to do. So, and, and everything we'll talk about in this course, it'll be well defined. So, so this is the defined as the average reward that the player gets over the infinite sequence. So this gives us one number. The reason we have a second definition is that there's something kind of counter intuitive about the average reward. Let me put it back up. The reason that it's kind of counter intuitive is that, if I get some bad payoff for a finite amount of time, let's say, for the first 100,000 iterations, I get a payoff of negative a million. And then, for the rest of time after that, I get some good payoff. Let's say, one unit of utility. Then, the limit of the means would be one. Because the, the negative payoff that I got at the beginning is only for a finite amount of time and it washes out in the average if I go long enough out into the future. And, well, that's what the math says. But, that doesn't always model what we want to, to really reason about because we have an intuition that payoffs that you get early on are kind of more important than pay-offs that you get really far into the future. So, if we want to have a model of ut ility that has that property, we need to say that different payoffs matter differently. So, it's more important to me to get a good payoff in the first iteration than to get one in the millionth iteration. And the way that I can model that is by saying, my payoffs are multiplied by some discount factor. So, my discount factor talks about my value for payoffs at different times. So, my discount factor, beta, is some value strictly between 0 and 1. And, you can sort of think of it like an interest rate. It's saying, you know, with money, if I wanted to tell somebody that I'm going to pay them $100 in a year they would value that at less than $100 today. and so and, and the amount by which they would value it less today kind of corresponds to the interest rate. and that's kind of exactly what's going on in the math here. So, what I'm saying here is that my utility for this stream of payoffs this stream of r's, this stream of payoffs, is weighted by the discount factor to the power of which payoff in the sequence it is. So, I, I'm going to discount each payoff successively. So, the first one is going to have the discount factor applied once. The second one is going to have the discount factor applied twice, so I'm going to get the discount factor squared and so on, all the way through the sequence. So, each of them is going to be diminished, but each of them is still going to matter. And there are two ways we can think about what the discount factor means. So, the first is kind of the interpretation that I've been telling you so far. That the agent just cares more about the near term than the long term. there 's another definition which is different but mathematically the same, so it's interesting to think about. And that is, that the agent really is the agent we just talked about in the average reward case, cares just about as much as every payoff. But with some probability, the probability actually 1 minus beta, the game will end in every given round. So, our game is not necessarily infinitely repeated, it's sort of potentially infinitely repeated. But every time we play the game, we're going to flip a coin. And with probability 1 minus beta, the game is going to just end. And with probability beta, the game is going to continue. And what that means is that here we'd be talking about my expected reward in the game, because there's a beta chance that I'll go to the next round. There's a beta squared chance that I'll go 2 rounds forward. There's beta cubed chance I'm going to go 3 rounds forward and so on. So, that means my expected utility in this game would, would just be the same formula. And that's it for defining utility in these games. Thanks very much.