In the next three segments we will change gears a little bit and talk about the definition of a PRG. This definition is a really good way to think of a PRG. And we will see many applications for this definition. So consider a PRG with keyspace A that ouputs N bit strings. Our goal is to define, what does it mean for the output of the generator to be indistinguishable from random? In other words, we're gonna define a distribution that basically is defined by choosing a random key in the keyspace. Remember that a arrow with R above it means choosing uniformly from the set script K. And then we output, basically, the output of the generator. And what we'd like to say. Is that this distribution. This distribution of pseudo random strings is indistinguishable from a truly uniform distribution. In other words, if we just choose a truly uniform string in 01 to the N and simply output this string, we'd like to say that these two distributions are indistinguishable from one another. Now if you think about it, this sounds really surprising because if we draw a circle here of all possible strings in 01 to the N, then the uniform distribution basically can ouput any of these strings with equal probability. That's the definition of the uniform distribution. However a pseudo-random distribution generated by this generator G. Because the seed space is so small, the set of possible outputs is really, really small, it's tiny inside of, 01 to the N. And this is really all that the generator can output. And yet, what we're arguing is that an adversary who looks at the output of the generator in this tiny set can't distinguish it from the output of the uniform distribution over the entire set. I think that's the property that we're actually shooting for. So to understand how to define this concept of indistinguishability from random, we need the concept of a statistical test. So, let me define what a statistical test on 01 to the N is. I'm gonna define these statistical tests by the letter A. And the statistical test is basically an algorithm that takes its inputs and N bit string, and simply outputs zero or one. Now I'll tell you that zero, we're gonna think of it as though the statistical test said, the input you gave me is not random. And one, we're going to think of it as saying that the imput you gave me actually is random. Okay, so all this statistical test does is it basically takes the input x that was given to it, the n bit string that was given to it, and decides whether it looks random or it doesn't look random. Let's look at a couple of examples. So the first example basically will use the fact that for a random string, the number of zeros is roughly equal to the number of ones in that string. In other words, the statistical test is going to say one. If and only if basically the number of zeros in the given string X minus the number of 1's in the given string X. These numbers are not too far apart. In other words, the difference between the number of 0's and the number of 1's. Let's just say is less than ten times square root of n. Okay If the difference is less than ten times, the statistical test will say hey the string X looks random. If the difference happens to be much bigger than ten times square root of n, that starts to look suspicious and the test, hey the string you gave me does not look random. A statistical test. Let's look at another similar example. We'll say here, the statistical test will say one. If and only if say the number of times that we have two consecutive zeros. Inside of X. But let's think about this for a second. This basically again counts. In this string of, n bits. It counts a number of times that we see the pattern zero, zero. Two consecutive zeros. Well for a random string. We will expect to see 0,0 as probability one fourth. And there for in a random string. We'll expect about N over four 0,0's. Yeah, N over four blocks a 0,0. And so, what the statistical test will do is it will say, well, if the number of zero zeros is roughly N over four. In other words, the difference between the number and N over four, is, say, less than ten square root of n, then we will say that X looks random. And if the gap is much bigger than N over four, we'll say, hey, this string doesn't really look random. And then the statistical test will output zero, okay? So here are two examples of statistical tests that basically, for random strings, they will output one with very high probability. But for strings that, you know, don't look random, for example, think of the all zero string. So the all zero string, neither one of these tests will output, one. And in fact, the all zero string does not look random. Let's look at one more example of the statistical test just to kinda show you the, basically statistical test can pretty much do whatever they want. So here's the third example. Let's say that statistical test output one if an only if I say the biggest blocks what we'll call this the Maximum Run of zero inside of the string x, this is basically the longest sequence of zero inside of the string x. In a random string you expect the longest sequence of zeros to be roughly of length log N. So we'll say if the longest sequence of zero happens to be less than ten times log N Then this test will say that X was random. But if, all of a sudden, we see a run of zeros that, say, is much bigger than ten log N, then the statistical test will say, the string is not random, okay? So this is another crazy thing that the statistical test will do. By the way, you notice that if you give this test, the all one string. So one, one, one, one, one. This test will also output one. In other words this test will think that the all one string is random. Even though it's not. Yeah, even though one string is not particularly random. Okay, so statistical tests don't have to get things right. They can do whatever they like. They can test, they can decide to output random or not. You know, zero or one, however they like. And similarly, there are many, many, many, many other statistical tests. There are literally hundreds of statistical tests that one can think of. And I can tell you that in the old days, basically, the way you would define. That something looks random. As you would say, hey, here's a battery of statistical tests. And all of them said that this string looks random. Therefore, we say that this generator that generated the string is good generator. In other words, this definition, then, uses a fixed set of statistical tests, is actually not a good definition for security, but more generally, for crytpo. But before we talk about actually defining security, the next thing we talk about is how do we evaluate whether a statistical test is good or not? So to do that, we define the concept of advantage. And so let me define the advantage. So here we have a generator that outputs N bit strings. And we have a statistical tests on N bit strings. And we define the advantage of this generator, as denoted by advantage sub PRG, the advantage of the statistical test A relative to the generator g. I'll define it as follows, basically the difference between two quantities. The first quantity is basically, we ask how likely is this statistical test to output one. When we give it a pseudo random string just like here K is chosen uniformly from the C space we ask how likely is the statistical test to output one when we give it a pseudo random output generated by the generator verses now we ask how likely is the statistical test to output one when we give it a truly random string. So here are is truly random in zero random one to the n. Okay, and yeah. We look at the difference between these two quantities. Now you realize because these are differences of probabilities this advantage is always going to lie in the interval zero, one. So let's think a little bit about what this advantage actually means. So first of all if the advantage happens to be close to one. Well what does that mean. That means that somehow, the statistical test A behaves differently when we gave it pseudo-random inputs, when we gave it the output of the generator, for when we gave it truly random inputs, right? It somehow behaved differently. In one case, it output one with a certain probability. And in the other case, it output one with a very different probability, okay? So somehow, it was able to behave differently. And what they really means is that the statistical test could basically distinguish the output of the generator from random. Okay, so in some sense we'll say that this statistical test broke the generator G because it was able to distinguish the output from random. However if the advantage is close to zero Well what does that mean. That means that basically the statistical tests behaves pretty much the same on pseudo random inputs. As it does on truly random inputs. And basically there we would say that A could not distinguish the generator from random. Okay, so this sum gives you a little bit of intuition about why this concept of advantage is important. It basically tells us whether A was able to break the generator, namely distinguish it from random, or not able to break it. So let's look, first of all, at a very silly example. Suppose we have a statistical test A that simply ignores its inputs and always outputs zero. Okay. Always output zero. What do you think of the advantage of this statistical test relative to a generator G? So, I hope everybody say the advantage is zero, let me just explain, why that's the case, well, if the statistical test, always outputs, zero, that means, pseudo random inputs, it will never output one, so, the probability that outputs one, is zero. Similarly, when we give a truly random input, it still will never output one, and, so, the probability that outputs one, is zero, and, so, zero minus zero is zero, so, its advantage is zero, so, basically, and, a statistical test that ignores its, its input, does not able to distinguish, truly random inputs, from, a pseudo random input, obviously. Okay, now let's look at a more interesting example. So suppose we have a generator G that satisfies a funny property. It so happens that for two-thirds of the keys The first bit of the output of the generator happens to be one, okay? So if I choose a random key with probability two-thirds, the generator will output one as its first bit, okay? So that's the property of the generator that we're looking at. Now, let's look at the following statistical test. The statistical test basically says, if the most signifigant bit of the string you gave me is one, I'm gonna say one, meaning I think it's random. If the most signigant bit of the stream you gave me is not one, zero, I'm gonna say zero. Okay so now my question to you is what is the advantage of this statistical test on the generator G? Okay, so remember I just wrote down the definition here again. And I'll let you think about this for a second. So let me explain. Suppose we give the statistical tests pseudo random inputs. By definition of G, we know that with probability two-thirds, the first bits in the inputs will start with the bit one. But if it starts with a bit one, then the statistical test will output one. In other words, the probability that this statistical test outputs one is exactly two-thirds. Now let's look at the case of a random string. If I give you a random string, how likely is it that the most signifigant bits of the random string is one? Well, for a random string, that happens exactly half the time, and so in this case the statistical test will output one, with probability one-half. And so the overall advantage is one-sixth, and one-sixth is actually a non-negligible number, that's actually a fairly large number, which basically means that this a was able to distinguish the output. We'll say that a breaks the generator G with advantage 1/6. Okay, which basically means that this generator is no good, is broken. Okay, so now that we understand what statistical tests are, we can go ahead and define, what is a secure pseudo-random generator. So, basically, we say that, as generator G is secure, if essentially no efficient, statistical tests can distinguish its output from random. More precisely, what we'll say is that, basically for all efficient statistical tests, a... Statistical tests, a... It so happens that if I look at the advantage. Of the statistical test E relative to G. This advantage basically is negligible. So, in other words, it's very close to zero, and as a result, this, statistical test was not able to distinguish the output from random, and that has to be true for all statistical tests. So, this is a very, very pretty and elegant definition, that says that a generator is secure, not only if a particular battery of statistical tests says that the output looks random, but, in fact, all efficient statistical tests will say the output looks random. Okay? One thing I'd like to point out is, that the restriction to efficient statistical tests is actually necessary. If we ask that all statistical tests, regardless of whether they're efficient or not, not be able to distinguish the output from random. Then in fact, that can not be satisfied. So in other words if we took out the requirements that the test be efficient. Then this definition would be unsatisfiable. And I'll leave this as a simple puzzle for you to think about. But basically the fact is that restricting this definition into only efficient statistical tests is actually necessary for this to be satisfiable. So now that we have a definition, the next question is can we actually construct a generator and then prove that it is in fact a secure PRG. In other words, prove that no efficient statistical test can distinguish its output from random. And it turns out that the answer is we actually can't. In fact, it's not known. If there are any probably secure PRG's. Then I will just say very briefly the reason is that if you could prove that a particular generator is secure that would actually imply that P is not equal to NP. And I don't want to dwell on this. Because I don't want to assume that you guys know what P and NP are. But I'll tell you as a simple fact that in fact in P is equal to NP. Then it's very easy to show that there are no secure PRGs. And so if you could prove to me that a particular PRG is secure, that would imply that P is not equal to NP. Again, I will leave this to you as a simple puzzle to think about. But, even though we can't actually rigorously prove that a particular PRG is secure, we still have lots and lots and lots of heuristic candidates, and we even saw some of those in the previous segments. Okay now that we understand what is a secure PRG. I want to talk a little bit about some applications and implications of this definition. And so the first thing I want to show you is that in fact a secure PRG is necessarily unpredictable. In a previous segment, we talked about what it means for a generator to be unpredictable. And we said that, basically, what that means is that, given a prefix of the output generator, it's impossible to predict the next bit of the output. Okay, so we'd like to show that if a generator is secure, then necessarily, it means it's unpredictable. And so the only way we're gonna do that is using the contrapositive. That is, we're gonna say that if you give me a generator that is predictable, then necessarily, it's insecure. In other words, necessarily, I can distinguish it from random. And so let's see, this is actually a very simple fact. And so let's see how we would do that. So suppose you give me a predictor. In other words, suppose you give me an efficient algorithm, such that, in fact, if I give this algorithm the output of the generator, but I give it only the first I-bits of the outputs. It's able to predict the next bit of the output. In other words given the first I-bit it's able to predict the I plus first bit. And it does that with a certain probability. So let's say if we choose a random k. From the keyspace. Then, clearly, a done predictor would be able to predict the next bit with probability one-half, simply just guess the bits. You'll be right with probability one-half. However, this algorithm A is able to predict the next bit with probability half with epsilon. So it's bound to the way. From a half. And, in fact, we require that this by true for some non negligible epsilon. So, for example, epsilon =1/1000 would already be a dangerous predictor, because it can predict the next bits, given a prefix, with non negligible advantage. Okay, so suppose we have such an algorithm. Let's see that we can use this algorithm to break our generator. In other words to show that a generator is distinguishable from random and therefore, is insecure. So what we'll do is we'll define a statistical test. So, let's define the statistical test B as follows. Basically, B, given a string, x, what it will do, is it will simply run algorithm A on the first I-bit of the string x that it was given. And, statistical test b is simply gonna ask, was a successful in predicting the I-plus first bit of the string? If it was successful, then it's gonna output one. And if it wasn't successful, then it's gonna output zero. Okay. This our statistical task. Let's put it in a box So we can take it wherever we like. And we can run the statistical test on any N bit string that's given to us as inputs. So now, let's look at what happens. Suppose we give the statistical test, a truly random string. So a truly random string R. And we ask, what is the probability that the statistical test outputs one? Well, for a truly random string, the I+1 bit is totally independent of the first I-bits. So whatever this algorithm is gonna output is completely independent of what's, I+1 bit of the string R is. And so whatever A outputs the probability is going to be equal to some random bit X I+1. Random independent bit X I+1, that probability is exactly 1/2. In other words, algorithm a simply has no information about what the bit X I+1 is, and so necessarily, the probability is able to predict X I+1 is exactly one half. On the other hand, let's look at what happens when we give our statistical tests a pseudo-random sequence, okay. So now we're going to run the statistical test on the output of the generator, and we ask how likely is it to output one. Well, by definition of A, we know that when we give it the first I bits of the output of the generator, it's able to predict the next bit with probability 1/2 + epislon. So in this case our statistical test B will output one with probability greater than 1/2 + epsilon And basically what this means, is if we look at the advantage of our statistical tests over the generator G it's basically, the difference between this quantity and that quantity. There's a difference between the two. You can see that it's clearly greater than an epsilon. So what this means is that if algorithm A is able to predict the next bits with advantage epsilon, then algorithm B is able to distinguish the output of the generator with advantage epsilon. Okay? So if A is a good predictor, B is a good statistical test that break the generator. And as we said, the counter-positive of that is that if G is a secure generator, then there are no good statistical tests. And as a result, there are no predictors. Okay? Which means that the generator is, as we said, unpredictable. Okay, so, so far, what we've seen is that if the generator is secure, necessarily, it's impossible to predict the I+1 bit, given first I bits. Now there's a very elegant and remarkable theorem Yao back in 1982. They chose it, in fact the converse is also true. In other words, if I give you a generator that's unpredictable, so you cannot predict the I+1 bits from the first I bits, and that's true for all I. That generator, in fact, is secure. Okay, so let me state the theorem a little bit more precicely. So here we have our generator that outputs n bit outputs. The theorem says the following, basically for all bit positions, it's impossible to predict I+1 bit of the output given the first I bit. And that's true for all I. In other words, again, the generator is unpredictable for all bit positions. Then, that, in fact, implies that the generator is a secure PRG. I want paraphrase this in English, and so the way to kinda interpret this result is to say that it's basically these next bit predictors. These predictors that try to predict the I+1 bit given the first I bits. If they're not able to distinguish G from random, then, in fact, no statistical test is going to distinguish G from random. So kind of next bit predictors are in some sense universal, predictors, when it comes to distinguishing things from random. This theorem, by the way, it's not too difficult to prove, but there's a very elegant idea behind its proof. I'm not gonna do the proof here, but I encourage you to think about this as a puzzle, try to kind of try to prove this theorem yourself. Let me show you kind of one cute implication of this theorem. So let me ask you the following question. Suppose I give you a generator and I tell you that given the last bit of the output. It's easy to predict the first bit of the outputs, okay? So given the last end bits, you can compute the first end bits. That's kind of the opposite of predictability, right? Predictability mean given the first bit, you can produce the next bits. Here, given the last bits, you can produce the first ones. And my question to you, does that mean that the generator is predictable? Can you somehow, from this fact, still build a predictor for this generator? This is kind of a simple application of Yao theorem let me explain to you the answer is actually yes let me explain why how do we build this generator well, actually we're not going to build it I'm going to show you that the generator exists. Well because an over two bits first an over two bits doesn't necessarily mean that the generator here let me write them this way what it means is that g is not secure. Because just as we did before it's very easy to build a statistical test that will distinguish the output of G from uniform. So G is not secure. But if G Is not secure, by Yao's Theorem, that means that G is predectible. So in other words, there exists some I for which given the first I bits of the output, you can build the I+1 bits of the output. Okay, so even though I can't quite point to you a predicter, we know that a predicter must exist. So that's a one cute simple application of Yao theorem. Now before we end the segment I want to kind of, generalize a little bit of what we did. And introduce a little bit of important notation that's going to be useful actually throughout. So, we're gonna generalize the concept of indistinguishability from uniform, to indistinguishability of two general distributions. So, suppose I give you p1 and p2, and we ask, can these two distributions be distinguished? And so we'll say that the distributions are computationally indistinguishable, and we'll denote this by p1, a squiggly p. P2. This means that, in polynomial time, P1 cannot be distinguished from P2. And we'll say that they're indistinguishable, basically, just as before if basically for all, efficient, statistical tests, statistical tests. A it so happens that if I sample from the distribution P1. And I give the output to A. Versus if I sample from the distribution P2, and I give the sample to A. Then basically A behaves the same in both cases. In another-wards the difference between these two probabilities is negligible. And this has to be true for all statistical tests. For all efficient statistical tests. Okay? So if this is the case then we say that, well A couldn't distinguish it's advantage in distinguishing two distributions is negligible and if that's true for all efficient statistical tests then we say that the distributions are basically computationaly indistinguishable, because an efficient algorithm cannot distinguish them. And just to show you how useful this notation is, basically using this notation the definition of security for PRG just says. That if I give you a pseudo-random distribution. In other words, I choose K at random, and that outputs a G of K. That distribution is computationally indistinguishable from the uniform distribution. So you can see this, this very simple notation captures the whole definition of pseudo-random generators. Okay, so we're gonna make use of this notation. In the next segment, when we define, what does it mean for a cipher to be secure.