My goal for the next few segments is to show you that if we use a secure PRG We'll get a secure stream safer. The first thing we have to do is define is, what does it mean for a stream safer to be secure? So whenever we define security we always define it in terms of what can the attacker do? And what is the attacker trying to do? In the context of stream ciphers remember these are only used with a onetime key, and as a result the most the attacker is ever going to see is just one cypher text encrypted using the key that we're using. And so we're gonna limit the attackers? ability to basically obtain just one cypher text. And in fact later on we're going to allow the attacker to do much, much, much more, but for now we're just gonna give him one cypher text. And we wanted to find, what does it mean for the cypher to be secure? So the first definition that comes to mind is basically to say, well maybe we wanna require that the attacker can't actually recover the secret key. Okay so given ciphertext you shouldn't be able to recover the secret key. But that's a terrible definition because think about the falling brilliant cipher but the way we encrypt the message using a key K is basically we just output the message. Okay this is a brilliant cipher yeah of course it doesn't do anything given a message just output a message as the cipher text. This is not a particularly good encryption scheme; however, given the cipher text, mainly the message the poor attacker cannot recover the key because he doesn't know what the key is. And so as a result this cipher which clearly is insecure, would be considered secure under this requirement for security. So this definition will be no good. Okay so it's recovering the secret key is the wrong way to define security. So the next thing we can kinda attempt is basically just say, well maybe the attacker doesn't care about the secret key, what he really cares about are the plain text. So maybe it should be hard for the attacker to recover the entire plain text. But even that doesn't work because let's think about the following encryption scheme. So suppose what this encryption scheme does is it takes two messages, so I'm gonna use two lines to denote concatenation of two messages M0 line, line M1 means concatenate M0 and M1. And imagine what the scheme does is really it outputs M0 in the clear and concatenate to that the encryption of M1. Perhaps even using the One Time Pad okay? And so here the attacker is gonna be given one cipher text. And his goal would be to recover the entire plain texts. But the poor attacker can't do that because here maybe we've encrypted M1 using the One Time Pad so the attacker can't actually recover M1 because we know the One Time Pad is secure given just one cipher text. So this construction would satisfy the definition but unfortunately clearly this is not a secure encryption scheme because we just leaked half of the plain text. M0 is completely available to the attacker so even though he can't recover all of the plain text he might be able to recover most of the plain text, and that's clearly unsecured. So of course we already know the solution to this and we talked about Shanon definition of security perfect secrecy where Shannon's idea was that in fact when the attacker intercepts a cipher text he should learn absolutely no information about the plain texts. He shouldn't even learn one bit about the plain text or even he shouldn't learn, he shouldn't even be able to predict a little bit about a bid of the plain text. Absolutely no information about the plain text. So let's recall very briefly Shannon's concept of perfect secrecy basically we said that you know given a cipher We said the cipher has perfect secrecy if given two messages of the same length it so happens that the distribution of cipher texts. Yet if we pick a random key and we look into distribution of cipher texts we encrypt M0 we get exactly the same distribution as when we encrypt M1. The intuition here was that if the advisory observes the cipher texts then he doesn't know whether it came from the distribution the result of encrypting M0 or it came from the distribution as the result of encrypting M1. And as a result he can't tell whether we encrypted M0 or M1. And that's true for all messages of the same length and as a result a poor attacker doesn't really know what message was encrypted. Of course we already said that this definition is too strong in the sense that it requires really long keys if cipher has short keys can't possibly satisfy this definition in a particular stream ciphers can satisfy this definition. Okay so let's try to weaken the definition a little bit and let's think to the previous segment, and we can say that instead of requiring the two distributions to be absolutely identical what we can require is, is that two distributions just be computationally indistinguishable. In other words a poor, efficient attacker cannot distinguish the two distributions even though the distributions might be very, very, very different. That just given a sample for one distribution and a sample for another distribution the attacker can't tell which distribution he was given a sample from. It turns out this definition is actually almost right, but it's still a little too strong, that still cannot be satisfied, so we have to add one more constraint, and that is that instead of saying that this definition should have hold for all M0 and M1. It is to hold for only pairs M0 M1 that the attackers could actually exhibit. Okay so this actually leads us to the definition of semantics security. And so, again this is semantics security for one time key in other words when the attacker is only given one cipher text. And so the way we define semantic security is by defining two experiments, okay we'll define experiment 0 and experiment 1. And more generally we will think of these as experiments parentheses B, where B can be zero or one. Okay so the way the experiment is defined is as follows, we have an adversary that's trying to break the system. An adversary A, that's kinda the analog of statistical tests in the world of pseudo random generators. And then the challenger does the following, so really we have two challengers, but the challengers are so similar that we can just describe them as a single challenger that in one case takes his inputs bits set to zero, and the other case takes his inputs bits set to one. And let me show you what these challengers do. The first thing the challenger is gonna do is it's gonna pick a random key and then the adversary basically is going to output two messages M0 and M1. Okay so this is an explicit pair of messages that the attacker wants to be challenged on and as usual we're not trying to hide the length of the messages, we require that the messages be equal length. And then the challenger basically will output either the encryption of M0 or the encryption of M1, okay so in experiment 0 the challenger will output the encryption of M0. In experiment 1 the challenger will output the encryption of M1. Okay so that the difference between the two experiments. And then the adversary is trying to guess basically whether he was given the encryption of M0 or given the encryption of M1. Okay so here's a little bit of notation let's define the events Wb to be the events that an experiment B the adversary output one. Okay so that is just an event that basically in experiment zero W0 means that the adversary output one and in experiment one W1 means the adversary output one. And now we can define the advantage of this adversary, basically to say that this is called the semantics security advantage of the adversary A against the scheme E, to be the difference of the probability of these two events. In other words we are looking at whether the adversary behaves differently when he was given the encryption of M0 from when he was given the encryption of M1. And I wanna make sure this is clear so I'm gonna say it one more time. So in experiment zero he was given the encryption of M0 and in experiment one he was given the encryption of M1. Now we're just interested in whether the adversary output 1 or not. ... In these experiments. If in both experiments the adversary output 1 with the same probability that means the adversary wasn't able to distinguish the two experiments. Experiments zero basically looks to the adversary the same as experiment one because in both cases upload one with the same probability. However if the adversary is able to output 1 in one experiment with significantly different probability than in the other experiment, then the adversary was actually able to distinguish the two experiments. Okay so... To say this more formally, essentially the advantage again because it's the difference of two probabilities the advantage is a number between zero and one. If the advantage is close to zero that means that the adversary was not able to distinguish experiment zero from experiment one. However if the advantage is close to one, that means the adversary was very well able to distinguish experiment zero from experiment one and that really means that he was able to distinguish an encryption of M0 from an encryption of M1, okay?So that's out definition. Actually that is just the definition of the advantage and the definition is just what you would expect basically we'll say that a symmetric encryption scheme is semantically secure if for all efficient adversaries here I'll put these in quotes again, "For all efficient adversaries the advantage is negligible." In other words, no efficient adversary can distinguish the encryption of M0 from the encryption of M1 and basically this is what it says repeatedly that for these two messages that the adversary was able to exhibit he wasn't able to distinguish these two distributions. Now I want to show you this, this is actually a very elegant definition. It might not seem so right away, but I want to show you some implications of this definition and you'll see exactly why the definition is the way it is. Okay so let's look at some examples. So the first example is suppose we have a broken encryption scheme, in other words suppose we have an adversary A that somehow given the cipher text he is always able to deduce the least significant bit of the plain text. Okay so given the encryption of M0, this adversary is able to deduce the least significant bit of M0. So that is a terrible encryption scheme because it basically leaks the least significant bit of the plain text just given the cipher text. So I want to show you that this scheme is therefore semantically secure so that kind of shows that if a system is semantically secure than there is no attacker of this type. Okay so let's see why is the system not semantically secure well so what we are gonna do is we're gonna basically use our adversary who is able to learn our most insignificant bits, we're going to use him to break semantic security so we're going to use him to distinguish experiment zero from experiment one Okay so here is what we are going to do. We are algorithm B, we are going to be algorithm B and this algorithm B is going to use algorithm A in its belly. Okay so the first thing that is going to happen is of course the challenger is going to choose a random key. The first thing that is going to happen is that we need to output two messages. The messages that we are going to output basically are going to have differently significant bits. So one message is going to end with zero and one message is going to end with one. Now what is the challenger going to do? The challenger is going to give us the encryption of either M0 or M1, depending on whether in experiment 0 or in experiment 1. And then we just forward this cipher text to the adversary okay so the adversary A. Now what is the property of adversary A? Given the cipher text, adversary A can tell us what the least significant bits of the plain text is. In other words the adversary is going to output the least significant bits of M0 or M1 but low and behold that's basically the bit B. And then we're just going to output that as our guest so let?s call this thing B prime Okay so now this describes the semantic security adversary. And now you tell me what is the semantic security advantage of this adversary? Well so let's see. In experiment zero, what is the probability that adversary B outputs one? Well in experiment zero it is always given the encryption of M zero and therefore adversary A is always output the least significant bit of M zero which always happens to be zero. In experiment zero, B always outputs zero. So the probability that outputs one is zero. However in experiment one, we're given the encryption of M1. So how likely is adversary B to output one in experiment one well it always outputs one, again by the properties of algorithm A and therefore the advantage basically is one. So this is a huge advantage, it's as big as it's gonna get. Which means that this adversary completely broke the system. Okay so we consider, so under semantic security basically just deducing the least significant bits is enough to completely break the system under a semantic security definition. Okay, now the interesting thing here of course is that this is not just about the least significant bit, in fact of the message for example the most significant bit, you know bit number seven Maybe the XOR of all the bits in the message and so on and so forth any kind of information, any bit about the plain text they can be learned basically would mean that the system is not semantically secure. So basically all the adversary would have to do would be to come up with two messages M0 and M1 such that under one thing that they learned it's the value zero and then the other thing, the value, is one. So for example if A was able to learn the XOR bits of the message then M0 and M1 would just have different XOR for all the bits of their messages and then this adversary A would also be sufficient to break semantic security. Okay so, basically for cipher is semantically secure then no bit of information is revealed to an efficient adversary. Okay so this is exactly a concept of perfect secrecy only applied just efficient adversaries rather than all adversaries. So the next thing I wanna show you is that in fact the one time pad in fact is semantically secure, they better be semantically secure because it's in fact, it's more than that it's perfectly secure. So let's see why that's true. Well so again it's one of these experiments, so suppose we have an adversary that claims to break semantic security of the one time pad. The first the adversary is gonna do is he's gonna output two messages M0 and M1 of the same length. Now what does he get back as a challenge? As a challenge, he gets either the encryption of M0 or the encryption of M1 under the one time pad. And he's trying to distinguish between those two possible cipher texts that he gets, right? In experiment zero, he gets the encryption of M0 and in experiment one, he gets the encryption of M1. Well so let me ask you, so what is the advantage of adversary A against the one time patent? So I remember that the property of the ones I had is that, that K, XOR M0 is distributed identically to K, XOR M1. In other words, these distributions are absolutely identical distribution, distributions, identical. This is a property of XOR. If we XOR the random pad K with anything, either M0 or M1, we get uniform distribution. So in both cases, algorithm A is given as in input exactly the same distribution. Maybe the uniform distribution on cipher texts. And therefore it's gonna behave exactly the same in both cases because it was given the exact distribution as input. And as a result, its advantage is zero, which means that the onetime pad is semantically secure. Now the interesting thing here is not only is it semantically secure, it's semantically secure for all adversaries. We don't even have to restrict the adversaries to be efficient. No adversary, doesn't matter how smart you are, no adversary will be able to distinguish K XOR M0 from K XOR M1 because the two distributions are identical. Okay, so the one time pad is semantically secure. Okay, so that completes our definition of semantic security so the next thing we're going to do is prove to the secure PRG, in fact implies that the stream cipher is semantically secure.