In this unit we're going to talk about the challenge of translating individual statements or individual instructions from symbolic to binary code. Now the general problem that we are facing is developing an assembler that translates program from symbolic hack language, into binary hack code and in order to write such an assembler we have to know how to deal with white space, instructions and symbols. Now in order to make our life easier we decided to defer the treatment of symbols to a later stage and we also described previously how to handle white space. Basically we're going to ignore white space. So, what remains, is a program that contains instructions only, A-instructions and C-instructions without symbols. So, in order to write an assembler that can translate such a programming to binary code we have to know how to translate A-instructions and C-instructions. And that's exactly what we're going to discuss next. So, let us begin with A-instructions. Here's the general rule and the syntax rule of how to put together an A-instruction and an example. And here's the same, instruction expressed in the binary code. Notice that, the opcode of an A-instruction is zero. This is the first, red zero bit that you see, in the example at the bottom right. By the way, the colors are completely meaningless, and I used them only to improve our communications. You know, once it gets into actually implementation, the colors once again play no role whatsoever. So how to translate from symbolic to binary when it comes to an A-instruction. Well if you think about it, the only challenge is to do something with this value. So basically if the value is a decimal constant, all we have to do is compute the binary representation of this value. Add as many zeroes as we need in order to turn it into a 15 bit constant. Ed or append, the zero opcode and, that's what we, and, and that's what is a, a needed 16 bit, representation of this, of this instruction. So it's not a big deal, but obviously we have to do it. What if, value's a symbol? Well, I just want to, remind you, what what we said earlier. We're going to, deal with symbols later on. So, translating instructions is, is something that's not terribly complicated, but once again it takes, some programming. All right, what about C-instructions. Well, C-instructions also have a symbolic manifestation and and a binary manifestation. And there's a set of tables, so to speak, that describe the mapping from. Symbolic pneumonics into their binary representations. So these are the rules of the game when it comes to translating C-instructions. So how do you actually do it? I think the easier way to to describe the translation process is to do it using an example. So suppose we're giving this single C-instruction and we have to translate that into binary code. Well, before we go on I want to remind you that every C-instruction consists of three fields. So, in the example that we have here. The MD pneumonic is the destination field or the value of the destination field in this particular instruction. Then comes D plus 1, which is the value of the comp or computational, computation field. And we also have a jump field which happens to be null. You know, we don't have a jump instruction here so the jump is a null. So implicit in what I'm saying is the assumption that we can somehow take this source instruction and decompose it into these three fields. And indeed this is something that is going to be done by an element of our assembler called parser. The parser is going to take a source statement or instruction written in symbolic code and chop it into three individual, individual fields, and then we can inspect every one of these fields in isolation. And act on it, and that's exactly what I'm going to do next. So, let us begin to, put together the string of characters that will end up representing the binary code of MD equals D plus one. We can do it in, in many different ways, and here is one of them. Focusing on the target expression we see that every C-instruction begins with three ones. When it comes to to the binary flavor of C-instructions, so we initialize the string that we are building. With three ones. That's, that's what we do when we get started. Next we focus, if we want, we focus on the next seven bits that we have to create. And these seven bits correspond to the comp field. Now, the comp field happens to be D plus one. Once again, I assume that I have access to this, field, I can easily retrieve it. I see that it's D plus one. And if it's D plus one I can look it up in the relevant table. I look up this pneumonic, and I see that it corresponds to, an a bit, which is zero, and to six control bits which are zero and five ones. So putting these bits together I generate the seven bit value zero zero one one one one one and I append it to the string that I gradually build here. Alrl ight. The next thing to do is focus on the next part of the instruction which corresponds to the destination. So, I look up the value of my destination field. It happens to be MD. I consult the relevant, table. And I see that the MD pneumonic corresponds to 011. I take this 011 value, I append it to this string that I gradually built, and I've completed the the synthesis of these two fields together. Moving along the remaining three bits correspond to the jump directive. I look up the value of the jump directive in my source instruction. I see that it's null. I look up null in the relevant table and I see that it corresponds to the three bits zero, zero, zero. I take these bits, append them to the end of the string and voila, we have managed to translate the symbolic instruction into its binary equivalent. As you see, everything that they do here is text processing, and in, in some languages it is called string processing, I have a source string which I analyze in some way and, I have a target string which I build, you know, in some gradual process. And, every high level language has the capability to to do such as string processing. And, that's exactly what you'll do when you actually write the assembler. Now, when we say that we generate binary code. There may be some misunderstanding here so let me clarify it. We basically generate a text file that contains of that which is, which consists of two characters only zero and one but they are treated as characters as SD or Unicode characters. So we have this text file that we write which consists of zeros and ones only. And later on we're going to load this text file into the computer. And once we load it into the computer it becomes real zeros and ones so to speak. All right so this basically concludes. The translation of C-instructions. So with that in mind, here is the overall assembly logic. We are given a text file, and this text file contains all sorts of characters that hopefully represent a Hack program written in symbolic Hack code. So, how do we translate it into binary? Well, we process this file. And, we begin marching through it. And for each instruction, or for each line in this file, we first of all parse the instruction. We break it into its underlying fields. So if we have an A-instruction we have the at sign and a value. We take this value and we re-express it in binary code. If it's a C-instruction we split the instruction into its three fields comp, desk, and and jump, and also for every one of these fields we generate the, corresponding binary binary bits, and then we assemble all these codes that we, created into a single 16 bit instruction. And we write this instruction to an output file. Which is not seen here in the picture. But, the assembler creates this empty file at the beginning. And begins to populate it with, lines. Each line being a sequence of, or a string of 16 zeros and ones. So this is the overall process of translating a Hack program. A program that consists of only A and C-instructions without symbols. If you do all this, then you will manage to create an assembler that can translate programs into binary code and everything looks very nice indeed, and I'd hate to be, a party spoiler, but there's a big gap that we haven't closed yet, and this is the fact that the source code contains no symbols. We have managed to to design an assembler that translates a symbol-less programs, but we still have to close this gap of dealing with symbols, and that's exactly what we're going to do in the next unit.