This lecture is going to introduce the idea of mixed strategies and extend our previous concept of Nash Equilibrium to this new definition. Let's begin by looking at the matching pennies game. Recall that it would be a pretty bad idea to play any deterministic strategy in this game. For example, if player two were to play heads, then player one would want to respond by playing heads to get a payoff of 1. Meaning that player two would prefer to change to tails so that he can get this payoff of 1. Meaning that player one would prefer to change to tails so he can get this playoff of 1. Meaning that player two would prefer to change back to heads to get this payoff of 1. Meaning that player one would prefer to change back to heads as well, to get this payoff of 1 where we started. And so you can see there's kind of a cycle where we just bounced around between the different cells of this game matrix. And essentially argued that no pair of deterministic strategies works for both players, so what does work for both players? Well essentially, it does make sense for players to confuse each other by choosing to play randomly. So intuitively, instead of saying, I'm going to commit to playing heads or I'm going to commit to playing tails, I can say, I'm going to commit to flipping this coin and playing whatever side comes up. So let's try to make this idea formal. Before we talk about the idea of pure strategies which we just equated the playing actions. Now let's think of things in terms of probability distributions. So let say that a strategy for an agent is any probability distribution over the actions that the player has available to him. And the pure strategy then is the special case where I play only one action with positive probability. A mixed strategy says, I'm just going to play more than one action with positive probability. There might be a couple of different actions that I signed positive probability to, like in my example with matching pennies. I'm going to call the support of my mixed strategy the set of actions that get positive probability. So for example, when I flip a coin when I'm playing matching pennies, both heads and tails are in the support of my mixed strategy. My support is the set, head, tails. I'm going to define the set of all strategies for an agent i to be capital S sub i. And I'm going to define the set of all strategy profiles capital S to be the Cartesian product of this strategy set for the different agents. Now I have the problem that I've elaborated my definition of strategies in the game to include not just this finite set of things players can do. But this infinite set of all of the probability distributions over these finite sets. The reason this is a problem is I only have a utility definition for action profiles and now I'm allowing things to happen that I don't have utilities for. That is to say, I can't just read a number out of the game matrix to figure out how happy the players are when something happens. Because under a mixed strategy with a supportive size greater than 1, I won't always end up in the same cell of the matrix. So I can extend my definition of utility here by leveraging the idea of expected utility from decision theory. So these equations explain what this means, it looks a lot more complicated than it is. So what I'm saying here is that i is utility under mixed strategy profile s where little s is some element of the set of all possible mix strategy profiles. Big S is equal to the sum over all action profiles in the game. You can kind of think of this intuitively as the sum over all of the cells in the normal form of the game. Where I take the utility of each cell and I multiply it by the probability that that cell would be reached in the given mixed strategy. The probability of getting to cell a, strategy profile a as reaction profile a given strategy profile s. And then of course I need to define what this probability actually is and that's given here. The probability that I'll get to a given action profile, given a strategy profile s, is just the product of the probability of each player playing his part of that action profile. So, for example, if this player was playing with probability 0.5 on each action and this player was playing with probability 0.5. Then the probability that I will get to this cell is 0.25. This action profile rises with probability 0.25 because this thing happens half the time and this thing happens half the time. And so we have to multiply those two probabilities together to get the joined probability of this action profile, so that's what the definition here is saying. So in total, my utility for a strategy profile is my expected utility, taking an expectation over all of the action profiles in the support of that strategy profile. And weighting it to them by the probability that action profile would actually arise. Well now that I've defined what strategies are, I can go back to my definitions of best response and Nash equilibrium. And basically they work the way they did before, except I change all of the a's to s's. So that means I have to write these definitions again, and I'll go through them again. But conceptually, if you understood what best response and Nash equilibrium meant in the case of actions, then everything will work again. So I will say that a strategy s*i is an element of the set of best responses to strategy profile s-i when the following condition is true. For all other strategies, si, the player i could take, for all of the strategies in the set of possible strategies for that player. Notice that this is an infinite set, but that's okay, the definition still works. Then the utility that the player would get for playing as *i, when everybody else plays the strategy profile s-i, is at least as big as when he plays this other strategy si. So let me say that again, s*i is the best response to strategy profile s-i if it's at least as good as anything else, given that everybody else is playing s-i. Now, we can say that a strategy profile s is a Nash equilibrium if it's the case that for all agents, everybody is playing a best response. Incidentally, you might notice that I'm using a set membership operator here rather than an equal sign which is you might expected to see. Well, the reason I don't use an equal sign is because the set of best responses might have more than one thing in it. So, there might not be only one best response, sometimes there'll be multiple best responses. And so here what I'm saying is, a strategy profile is one of the best responses if this condition is true. And, I'm saying a strategy profile is a Nash equilibrium if everybody is playing one of their best responses. Well, this might seem like much to do about nothing. I've introduce this idea of randomizing as a strategy. I've redefined utility then I've leveraged this redefined definition of utility which is incidentally what I'm using here when I talk about the utility of strategy profile. To define best response, I've then leveraged that definition of best response here to talk about Nash equilibrium. And in total I just ended up kind of saying everything that you've already heard us say. But what does matter, is that now that we have a definition of Nash equilibrium, we're able to state a theorem that we didn't have before. And this is Nash's famous theorem, this is the reason why Nash gets the Nash equilibrium named after him. And this is one of the main reasons why Nash got the Nobel Prize. This theorem actually didn't take very long to prove but it's a really important thing for a Game Theory. And the theorem is that every finite game has a Nash equilibrium. First of all, what is a finite game? This sounds like I'm hedging here but it's not much of a hedge. A finite game just means that the game takes a finite amount of space to write down, so it has a finite number of players, it has a finite number of actions for every player. And that means it has a finite number of utility values in the game because the number of utility values is determined by the number of players and the number of actions for each player. So, as long as the game has a finite number of players, not just two players, but any finite number of players. And a finite number of action for each player, not just two actions but maybe a very big game. Then, no matter what the payoff values are, no matter what strategic situation we're talking about here. No matter what real world interaction this is modelling, there's going to be at least one Nash equilibrium in this game, that's a pretty deep thing. That's saying, there will always be some stable thing that all of the players can do. Which has the property that if they knew what everyone was doing, none of them would want to change their strategy. That's basically one of the main reasons why we care about this idea of Nash equilibrium. Because we know that no matter what the game is we can find such a Nash equilibrium and reason about that, that's why Nash equilibrium is such a powerful thing. And that's only true when we have this fuller definition of Nash equilibrium here that we've just defined in terms of strategies. We saw that when we talked about Nash equilibrium in terms of just actions before what we'll now from this point on refer to as pure strategy Nash equilibrium. So pure strategy Nash is when we do all of this with a's instead of s's, that's a pure strategy Nash equilibrium. And the sad thing about that is that we don't get a theorem that says that every finite game has one of those. But this mixed energy Nash equilibrium always exists. Let's do some examples, so remember Matching Pennies? Well we just argued at the beginning of this video, effectively the matching pennies doesn't have a pure strategy Nash equilibrium. But it does have a mixed strategy Nash equilibrium, it has one and that is as I suggested before, for both players to randomize 50-50. And that doesn't mean that it always has to be 50-50. That just happens to be what the natural equilibrium is here, that comes from the symmetry of the payoffs. But that turns out to be the natural equilibrium here. Let's come back to the coordination game, where we previously seen that these two strategy profiles I'm circling outcomes but remember an outcome isn't an equilibrium. You know 1, 1 isn't the equilibrium, that would be wrong to say. The right thing to say is that left, left is an equilibrium. Right, right is an equilibrium but it turns out there's another equilibrium here. So it turns out again, 0.5, 0.5 is the Nash equilibrium here as well. And that's kind of funny because it doesn't seem like 0.5, 0.5 is such a good thing to play in this game. But you can confirm to yourself that if player one is randomizing by playing 50-50, then player two can do no better than to randomize 50-50. Now you'll notice that player two could do just as well by playing something else. If player one is playing 50-50, player one is just as happy to go left all the time. But in particular, if player one's going 50-50, player two can do no better than to go 50-50 himself. The reverse is also true and that makes 50-50, 50-50 and Nash equilibrium of the coordination game as well. And let's look at prisoner's dilemma. In prisoner's dilemma we've previously seen that this is an equilibrium and it's an equilibrium in strictly dominant strategies. And we argued before that equilibria and strictly dominant strategies are unique. And so what that means is indeed there are any mixed Nash equilibria of the prisoner's dilemma. This is in fact they all mean Nash equilibrium.