So our goal for this segment is to build secure compression functions. So compression functions are collision resistant. So just to remind you where we are we looked at this Merkle Damguard construction which takes a little compression function and builds from it a hash function for much, much larger inputs. We proved this cute theorem that in fact if a little compression function h is collision resistant then plugging in into the Merkle Damguard construction gives us a collision resistant hash function for larger inputs. So now in this segment our goal is basically to build a compression function that is collision resistant. So we're going to see a couple of constructions. And so the first question that comes to mind is well, can we build compression functions from primitives that we already have? In particular, we spent a lot of work to build block cyphers and the question is can we build compression functions from block cyphers. And the answer is yes. And let me show you how to do it. So assume we have a certain block cypher here that operates on N bits blocks, so the input is an N bits, output is N bits. And then there's this classic construction called a Davies-Meyer construction which I wrote down in symbols here. Basically says that what you would do is, given the message block and the chaining variable, all we do is we encrypt the chaining variable using the message block as the key. And then we kind of do one more xor on the output. So this might seem a little bizarre, because remember the message block is something that's completely under the control of the adversary. He's trying to find the collision so he can choose the message blocks however he wants. And yet we're using this message block as a key into a block cypher. But nevertheless, we can argue that this construction, at least when E is what's called an ideal cypher, we can argue that this construction is in fact as collision resistant as possible. So let me state the theorem. The theorem states that, as I said, if E is an ideal block cypher, meaning that it's a random collection of K random permutations from 01 to the N to 01 to the N. Then under that assumption that E's an ideal block cypher, in fact, finding the collision for this compression function H takes time, two to the N over two. In particular, we can show that anyone who is finding collisions has to evaluate the encryption and decryption functions at least 2 to the N over 2 times. And if you think about what that means, since the output of this compression function is only N bytes long, we know that there's always a generic birthday attack that finds collisions in time 2 to the N over 2. So basically this theorem says that this collision resistant function is as collision resistant as possible. We can find the collision in time 2 to the N over 2 using the birthday attack but there is no algorithm that will do better than 2 to the n over 2. So this is actually a very common compression function used in collision resistance hashing in fact of a SHA functions all used Davies-Mayer. It turns out that there is something special about the Davies-Mayer construction that makes collision resistant. If you just try to guess the construction very likely you will end up with something that is not collision resistant. And so let me ask you the following puzzle. Suppose we actually define the compression function as follows, namely all we do is we encrypt the chaining variable H using the current message block as the key. So the difference is that we dropped this 'xor' H in Davies-Mayer, we simply ignored it. So it's not there. And the puzzle for you is show me that this compression function then is actually not collision resistant. So, let's see, so we're trying to build a collision, namely a distinct pair of HM and H' M' that happen to collide under this later function H. And my question to you is how would you do it? So I'm already going to tell you that you're going to choose H, M, and M' arbitrarily. The only requirement is that M and M' are distinct. And then my question is, how would you construct an H' that would cause a collision to pop out? So the answer is the first choice and an easy way to see it is if we apply the encryption function using M' to both sides. Then we get that this is basically E of M' applied to H', right. this is what we get by applying encryption using M' to the left hand side. And if we imply encryption using M' to the right hand side, the decryption operator cancels out and we simply are left with the encryption of M, H, which is exactly the collision that we wanted to find. So there. You can see that it's basically by using the decryption function D, it's very easy to build collisions for this compression function. Now I should tell you that in fact Davies-Meyer is not unique. There are other ways to build collision resistant compression functions from block ciphers. For example, here's a method called Miyaguchi Preneel. Miyaguchi Preneel actually is used in WHIRLPOOL hash function that we saw earlier. Here is another method that happens to result in a collision resistant compression function. And there are twelve variants like this playing with XORs and placing the variables in different slots that will actually give a collision resistant mechanism. But there are also many, many variants of this like we saw in the previous puzzle that are not collision resistant. So here's. Another example, that's not collision resistant. And I'm gonna leave it as a homework problem to figure out a collision for this construction. So now, basically, we have all the ingredients to describe the [inaudible] 256 hash function. I'll tell you that it's a Merkel-Damgard construction, exactly as the one that we saw before. It uses a Davies-Mayer compression function. And so the only question is, what's the underlying block cipher for Davies-Mayer? And that block cipher is called SHACAL-2. And I'll just tell you it's parameters. It uses a 512 bit key. And remember the key is taken from the message block. So, this is really what the message block is. And it so happens to be 512 bits. Which means the SHA-256 will process its input message 512 bits at a time. And in the block size, for this block cipher is 256 bits. And these are the chaining variable. So this would be H i-1. And this would be successive chaining variable. So now you have a complete understanding of how SHA-256 works. Module of this cipher SHACAL-2 I'm not going to describe here. So as I said, one class of compression functions is built from block cyphers. It turns out there's another class of compression functions that's built using hard problems from number theory, and I want to very briefly show you one example. And we call these compression functions provable because if you can find the collision on this compression function then you're going to be able to solve a very hard number theoretic problem which is believed to be intractable. And as a result, if the number theory problem is intractable, the resulting compression function is provably a collision resistant. So here's how this compression function works. Basically we're going to choose a large prime piece, so this is a gigantic prime, something like 700 digits, 2,000 bits. And then we're going to choose two random values, U and V, between one and P. And now let's define the compression function as follows. It takes two numbers between 0, and p-1, and it's gonna output one number between 0, and p-1. So it's compression ratio is 2 to 1. And takes two numbers. And outputs one number. In the range 0 to p-1. And it does it basically by computing double exponentiation. It computes u to the H times v to the n. And the theorem, which, right now, I'm just gonna state as a fact. This fact actually turns out to be fairly straightforward to prove, and we're gonna do it later on when we get to the number theoretic part of the course. The fact is, that if you can find a collision for this compression function, then you can solve a standard heart problem in number theory called a discreet log problem. Everyone believes discrete log is hard, and as a result, this compression function is provably collision resistant. So you might ask me why do people not use this compression function in practice. Why would we not use this for SHA-256? And the answer is that this gives very slow performance in comparison to what you get from a block cipher. So this is not really used for any compression functions. However, if for some reason you really only want to, say, MAC or sign. Just one long message, and you have a whole day to do it, then certainly you can use this type of compression function. And even though it's slow, you'll get something that's provably collision resistant. Okay, so that's the end of this segment. And now we're finally ready to talk about HMAC, which we're gonna do in the next segments.