The topic of learning is a very rich and fascinating one in game theory. we will able to, to sort of just get a taste for it. And we'll be focusing on two examples of learning in games one, one, one method called fictitious play and the other called no-regret learning. And in particular an algorithm called regret matching. but the topic really is vast, and so let me say a few words before we actually look at the specific methods. The first of all we should recognize that learning in, in, in game theory is fairly different from learning in certain other disciplines. For example, as it's done in machine learning in, in, in AI, in computer science or in statistics optimization. in those, in those, in those disciplines one usually has in mind a single actor acting in an environment and the environment is unknown to the agent, it may be , it may be partially observable. And so it's very difficult to figure out what an optimal strategy is, but there is a well-defined notion of optimal strategy and the goal of learning is to learn something about the environment and how to act best in it. In the case of game theory, the environment includes, our paths consist entirely of other agents. So, even as you're trying to learn and adapt, so are they. And so what ends up happening is, you can't really divorce the notion of learning from the notion of teaching, because as you adapt, you're influencing the activities of other agents. And just imagine informally the situation where there are two agents who repeatedly drive towards each other playing the game of chicken. And maybe drag racing by adolescents. And perhaps the one of, one of the one of the so, so what they want to do is, of course, they want to just zoom straight ahead and have the others go to the side of the road and, and given them the right of way. of course if they both do that, they collide and that's a bad idea. And so they sort of test each other and over time dare more or less. So imagine that there's one driver who's an extremely goo d modeler of the other driver, driver 2. And so driver 1 learns very well. Whatever driver 1 strategy is, driver 1 will learn what driver 2 strategy is and best respond to it. But over time he will figure it out and be a great, great responder. It seemed like you can't do any, any better than that. Well imagine now that player 2 is sort of a bully driver. Who doesn't really model the first driver very well but just barrels ahead not caring about the, the about the, the circumstances. Perhaps agree willing to take a few hits here and there to scare off the first driver. Well, what's going to happen is, the second driver, who's a terrible learner, and a very bad modeler of the first driver, is going to keep going straight ahead. and the first driver, the wonderful learner and best responder, is going to learn to accomodate. And what happens is that the second driver was perhaps a bad learner, but a very good teacher. So, that's one thing to keep in mind when you think about learning new games. And the other is that learning is, an overloaded term, and there are many things you might learn in the context of games. And, and we'll be looking at a very particular slice. The context we'll be looking at is specifically repeated games. And when we speak about the learning in repeated game, we really be speaking about strategies that as they unfold draw interesting inferences or use accumulated experience in an interesting way. that is the nature of learning that we, and in fact most of the literature in game theory, considers. So with that in mind here are two examples. The first and this is perhaps the granddaddy of learning regimes in game theory, called fictitious play. In fact, it really was not conceived initially, nor is today viewed as a realistic or effective learning method but it does contain many of the elements that you see in more involved versions of learning. It was presented first as a euristic for computer, computing the Nash Equilibrium in games. It turns out not to be a very eff ective procedure but it, it, it is an interesting thing, kind of a basic learning procedure. the way it works is simply, each agent starts with some belief about what is the strategy of the other agent. each agent best responds. The agent updates their beliefs based on what they observed in the, in this iteration of the game, and the game repeats. As I said, this is a very general regime and in fact this is a general regime of model based learning. We have a model of the agent which would best respond to an update over time. Fictitious plays is a special case where the model is simply a count of the other agent's action so far and you take their counts as their current mixed strategy. And so a little more formerly let's, let's assume that wa is the number of times the opponent played action a in the past. And there's some initial values for those that are non zero. And, and then you simply play a with probability that is proportional to the time that it was played in the past. That's a very straightforward, simple procedure. And so you, you always, there's something a little paradoxical going on, because every agent, assuming, unless we'll talk about two agents here. The two agents are always playing a pure strategy. And yet they're modeling each other always with playing a mixed strategy. So be that as it may, and we should always note that you need to worry about edge cases such as tie breakings and what happens if you have two actions who's played an equal number of times in the past, well you need to worry about that. Here's an example of how it might work and in the context of matching pennies. Matching pennies, again, two players, each choosing heads or tails. If they both chose the same, the first guy wins. If they chose differently, the second guy wins. Let's assume that these are the initial frequencies that they have in mind. And so I's beliefs about 2 is that 2 played heads with. You know, a frequency of 1.5, and a, entails a frequency 2. And these are player 2's belief abou t player 1. Okay now it's round 1 what should they do? Well, player 1 want's to best respond to his beliefs. He believes that this, he believes that this is the distribution of player 2 and he wants to match. So he's going to play tails so he can match, this is the best response to this mixed strategy that he ascribes with, to player 2 so he's going to play tails. What about player 2? Player 2 has these beliefs, and he wants to mismatch. So, since he believes that player 1 will play heads with greater probability than tails, he's going to play tails. So he too, is going to play tails. And this stage is over. Let's move now to the next stage. At this point, what happens, well, these are the updated beliefs of the players. Player 1 observed player 2 playing tails, so he increases the 2 to a 3, and so does player 2 increases his beliefs in what player 1 will do. So, what do they do, what player 1 still want's to match player 2 and he still believes in fact, even more strongly, that player 2 will play tail with greater probability, so he's going to play tail again. On the other hand, player 2 now believes that these are the probabilities, so player 2 believes that player 1 will play tail with the greater probability. Player 2 wants to mismatch, and so player 2 will now play heads. And you'll continue this calculation and you'll, you can persuade yourself that the play will proceed in this fashion. And this is how fictitious play, takes place. So, notice something interesting. the strategies don't converge, and if you were to continue to play this out, you would see that the T's and H's on both sides sort of ebb and flow. But you will see that there's a certain balance taking place over time. And this is, an, an, an, an, an, and in fact in this game they would converge that on average, if you look the long term average of the strategies, each of the agents will play, tails and heads with equal probability, .5. And so, we call this the empirical frequency. Now notice that in matching pennies, . 5 is also the unique Nash Equilibrium. And the question is, is this an accident? And the answer is no. And here's the theorem, the theorem says that if the empirical frequencies of the players plays converge in fictious play, then to have to converge to a Nash Equilibrium of the game. Now, they may not converge in general. That's why it's not an effective learning procedure in general. But there are a host of conditions under which the, even if the play doesn't converge, the empirical frequency does. And here are some of the conditions are sufficient. If the game is zero sum. If it's solved by something called iter, iterated elimination of strictly dominated strategies. If it's something called a potential gain, which we won't define here. Or if it is a 2 by n gain as with one pair has only two strategies, the other may have more. But it has what's called generic payoff, which we also, also want to find here. But the main thing to, to take away from this is that there are some sufficent conditions that guarentee that the empircal frequency of play and fictious play will converge even, even if the play itself will not. Let us now switch to a very different form of learning as a whole class will learn in core on no regret learning and it, it's difference in a fundamental way. first of all the methods themselves will not be model based, they will not explicity model what the other agent is doing. but rather we'll adapt the, the strategy directly there's one difference. But perhaps more fundamentally in this case we'll start with a learning method, but we start with a criterion for, for, for what we want the method to satisfy. namely the no regret criterion. And so, what does this say? We'll say that a regret on an agent at time t, for not having played some strategy s, is this difference. The difference between the payoff he actually got at the time and the payoff he would have gotten, had he played strategy s. That's the naturally enough notion. We will now define where the learning rule exhibit no regret. It will be the, if the case that if in the limit, agents would not play will not exhibit inner regret. In other words, if you as you go to the, limit, the probability with which the regret will tend to 0, is 1. Those rules will be called no-regret learning rules. And here is one such rule which is, surprisingly, simple, and it's called regret matching. And the way it works is as follows. It says, simply look at the regret that you have experienced, so far. And for each of your pure strategies, and pick the pure strategy in proportion to its regret. So, if we define the, and again, if the regret of the strategy at time t is Rt of s, then the probability at the next time that you'll play is, this is the sum of all regrets across all pure strategies. And take your relative regret, relative to all, to the sum of all regrets. And so, a very simple rule. And it has surprisingly, strong properties. first of all, it provably exhibits no regret and furthermore it actually converges the strategies when you use no regret matching, converge to a correlated equilibrium, at least for a finite, for finite games, finite games that are repeated. So, those are two examples of leaning rules, one model based specifically, fictitious play. Another one that is model free, regret matching, that ex, one of the family of learning methods that exhibit no regret. As we said in the beginning the topic of learning game is a very rich one but at least we have a taste for it now.