Over the years it became clear that DES and triple DES are simply not designed for modern hardware and are too slow. As a result, NIS started a new process to standardize in a new block cypher called the Advanced Encryption Standard or AES for short. Nis started it's effort in 1997 when it requested, proposals for a new block cipher. It received fifteen submissions a year later. And finally in the year 2000, it adopted the cypher called Rindall as the advanced encryption standard. This was a cypher designed in Belgium. We already said that it's block size is 128 bits and it has three possible key sizes. 128 bits, 192, and 256. Now, the assumption is that the larger the key size is, the more secure the block cipher is as a pseudo random permutation. But because it also has more rounds involved in its operation. The slower the cipher becomes. So the larger the key supposedly, the more secure the cipher, but also the slower it becomes. So for example, AES 128 is the fastest of these ciphers and AES 256 is the slowest. Now AES is built as what?s called a substitution permutation network. It's not a Faistl network. Remember that in a Faistl network, half the bit were unchanged from round to round. In a substitution permutation network all the bits are changed in every round. And the network works as follows, so here we have the first round of the substitution permutation network, where the first thing we do is we X or the current state with the round key. In this case the first round key. Then we go thru a substitution layer, where blocks of Date are replaced with other blocks based on what the substitution table says. And then we go through a permutation layer where bits are permuted and shuffled around. And then we do this again. We XR with an X round key, we go thru a substitution phase, and we're permute to dance around. And so on, and so on, and so forth Until we reach the final round where we x or with the very last around key, and then out comes the output. Now, and important point about this design. Is that, in fact, because of how it's built, every step in this network needs to be reversible, so that the whole thing is reversible. And so the way we would, decrypt, essentially, is we would take the output and simply apply each step of the network in reverse order. So we start with the permutation step, and we have to make sure that step is reversible. Then we look at the substitution layer, and we have to make sure this step is reversible. And this is very different from DES. In DES, if you remember, the substitution tables were not reversible at all. In fact, they map six bits to four bits. Whereas here, everything has to be reversible, otherwise it would be impossible to decrypt. And of course the x or with the round key is reversible as well. Okay? So inversion of a substitution permutation network is simply done by applying all of the steps in the reverse order. So now that we understand the generic construction. Lets look at the specifics of AS. So AS operates on a 128 bit block. Which is sixteen bytes. So what we do with AS is we write those sixteen bytes as a four by four matrix. Each cell in the matrix contains one byte. And then we start with the first round so we X over with the first round key and then apply a certain function. That, includes substitutions and permutations and other operations on the state. And again, these three functions that are applied here have to be invertible so that in fact the cypher can be decrypted. And then we xor with the next round key and we do that again. Again we apply the round function and xor the round key. And we do that again and again and again. We do it ten times. Although interestingly in the last round, the mix column step is actually missing. And then finally, we XOR with the last rounds key, and out comes the output. Again, in every phase here, we always, always, always keep this four by four array. And so the output is also four by four, which is sixteen bytes, which is 128 bits. Now the long key themselves of course come from a sixteen byte AS key using key expansion. So the key expansion maps from a sixteen bytes AS key into eleven keys, each one being sixteen bytes. So these keys themselves are also a four by four array that's XORed into the current state. Okay, so that's the schematic of how AES works. And the only thing that's left to do is specify these three functions, byte sub, shift row, and mixed column. And those are fairly easy to explain. So I'm just gonna give you the high level description of what they are. And, those interested in the details can look it up online. So the way byte substitution works, is literally it's one S box containing 256 bytes. And essentially, what it does is it applies the S box to every byte in the current states. So, let me explain what I mean by that. So the current state is gonna be this four by four, table. So here we have the four by four table. And to each element in this table, we apply the S box. So let's call it the A table. And then what we do is, essentially, for all four by four entries, essentially, the next step, Aij. Becomes the current step evaluated at the look up table. So we use the current cell as an entry, as an index, into the look up table. And then the value of the look up table is what's output. Okay. So, that's the first step. The next step that happens is a shift pro step, which is basically just a permutation. So essentially we kind of do a stick lick shift on each one of the rows. So you can see the second row is stick lick shifted by one position. This third row is [inaudible] shifted by two positions and the third row is [inaudible] shifted by three positions. And the last thing we do is mix columns where literally we apply a linear transformation to each one of these columns. So there's a certain matrix that multiplies each one of these columns and it becomes, the next column. So these linear transformation is applied independently to each one of the columns. Now, I should point out that, so far, shift rows and mixed columns are very easy to implement in code. And I should say that the [inaudible] institution itself is also easily computable, so that you can actually write code that takes less than 256 bytes to write. And you can kind of shrink the description of AES by literally storing code that computes the table rather than hardwiring the table into your implementation. And in fact, this is kind of a generic fact about AES, that if you can have allowed no pre computation at all, including computing the S box on the fly. Then in fact you get a fairly small implementation of AES, so it, it could fit on very constrained environments where there isn't enough room to hold, complicated code. But of course, this will be the slowest implementation, because everything is computed now on the fly, and as a result, the implementation, obviously, is gonna be, slower than things that were pre-computed. And then there is this trade off. For example if you have a lot of space, and you can support large code. You can actually precompute quite a bit of the three steps that I just mentioned. In fact, there are multiple options of pre-computation, you can build a table that's only four kilobyte big. You can build a table that is even longer, maybe 24 kilobytes. So basically you will have these big tables in your implementation. But then your actual performance is going to be really good, because all your doing is just table look-ups and XORs. You're not doing any other complicated arithmetic. And as a result, if you can do a lot of pre-computation, these three steps here, [inaudible] should. [inaudible] and mixed columns can be converted just into a number, a small number of table lookups and some [inaudible]. All you can do is just compute the S box, so now your implementation would just have 256 bytes. Hard coded The rest would just be code that's actually computing these three functions. The performance would be slower than in the previous step but the code footprint would also be smaller. So in overall, there's this nice tradeoff between code size and performance. So on high-end machines, on high-end servers, where you can afford to have a lot of code, you can precompute and store these big tables and get the best performance. Whereas on low-end machines like eight byte smart cards or think of like an eight byte wristwatch, you would actually have a relatively small implementation of AES. But as a result of course it won't be so fast. So here's an example that's a little unusual, suppose you wanted to implement AES in Javascript so you can send an AES library to the browser and have the browser actually do AES by itself. So in this case what you'd like to, to is you'd like to both shrink the code size, so that on the network there's minimum traffic to send a library over to the browser but, at the same time, you'd like the browser performance to be as fast as possible. And so this is something that we did a while ago essentially the idea is that the code that actually gets send to the browser doesn't have any pre computer table and as a result is fairly small code. But then the minute it lands on the browser, what the browser will do is actually pre compute all the tables. So in some sense the code goes from just being small and compact. It gets bloated with all these precomputed tables. But those are stored on the laptop, which presumably has a lot of memory. And then once you have the precomputed tables you actually encrypt using them. And that's how you get the best performance. Okay? So if you have to stand in implementation AES over the network, you can kind of get the best of all worlds. Whereas, the code over the network is small, but when it reaches the target client, it can kind of inflate itself. And then get the best performance as it's doing encryption on the clients. Now AES is such a popular block cipher, now essentially when you build crypto into products essentially your supposed to be using AES, and as a result Intel actually put AES support into the processor itself. So since Westmere there are special instructions in the Intel processor to help accelerate AES. And so I listed these instructions here. They come in two pairs, aesenc and aesenclast. And then, there's aesekygenassist. So, let me explain what they do. So, aesenc essentially implements one round of AES. Namely, apply the three functions in the XOR with the round key. And aesenclast basically implements the last round of AES. Remember, the last round didn't have the mix columns phase. It only had the subs bytes and shift rows. And so that's why the aesenclast does. And the way you call these instructions is using 128 byte registers which correspond to the states of AES. And so you would have one register containing the states and one register containing the current round key. And then when you call AES on these two registers, basically, they would run one round of AES and place the result inside of this XMM one state register. And as a result if you wanted to implement the whole AES. All you would do is, call aesenc nine times. Then you would call aesenclast one time. These ten instructions are basically the entire implementation of AES. That's it. It's that easy to implement AES on this hardware and they claim because these operations are now done inside the processor not using external instructions that are implemented in the processor. They claim that they can get a fourteen X speedup over say an implementation that's running in the same hardware but implementing AES without these special instructions. So this is quite a significant speed up and the facts are now lots of. Products that make use of these special instructions. And let's just say that this is not specific to Intel, if you're in [inaudible], and they also implemented exactly kinda similar instructions in their bulldozer architecture and further and future architectures. Okay, so let's talk about the security of AES. I wanna mention just two attacks here. Obviously, AES has been studied quite a bit. But the only two attacks on the full AES are the following two. So, first of all, if you wanted to do key recovery, the best attack, basically, is only four times faster than exhaustive search. Which mean that instead of a hundred and. 28 bits key, really you should be thinking of AES. Is 126 bit key. Cause exhaustive search, really it's gonna four times faster than it should. Of course due to the 126, it's still. More time than we have to compute, and this really does not hurt the security alias. The more significant attack is, actually, on AES-256. It turns out there's a weakness in the key expansion design of aes which allows for, what's called a related key attack. So, what's a related key attack? Essentially, if you give me about two to the 100 input output pairs for aes, but from four related keys. So, these are keys that are very closely related, namely key number one. Is just one bit flip of key #two, which is just one flip, bit flip of key #three, which is just one flip bit flip of key #four. These are very closely related keys, if you like your [inaudible] distances very short. But if you do that, then, in fact, there is a two to the 100 attack. Now you should say, well, two to the 100 is still impractical. This is still more time than we can actually run today. But nevertheless, the fact that it's so much better than an exhaustive search attack, it's so much better than two to the 56, is kind of a limitation of the cipher. But generally, it's not a significant limitation, because it requires related keys. And so, in practice, of course, you're supposed to be choosing your keys at random, so that you have no related keys in your system. And as a result, this attack wouldn't apply. But if you do have related keys, then there's a problem. So this is the end of the segment, and in the next segment we're gonna talk about more provably secure constructions for block ciphers.