in this cycle of posts I will try to tell about game code optimizations. As you propably have heard, we have optimized the upcoming 1.05 version of the game. Some of you might be actually interested what and how we did. This series of posts will try to show closer how to make code more efficient, mentioning especially things we had corrected in the game. So, have a good lecture!
This is the first post from the series. Here I would like to demonstrate basic things about architecture we’re working on, so x86, and point things which can be optimized, also how they can be optimized. So let’s start from the beginning.
x86 architecture – what is what?
At first I’d like to say that I don’t want to write things you can find in detail in the Internet – so I will not detail every aspect of CPU. I will just show the most important things, mentioning this what you can find and read in the Internet.
As everybody knows, processor, or CPU is just electronic device which computes. It processes digital signals (so either 1, which is high voltage state or 0 which is low voltage state) called bits. Each processor has frequency, which tells how many cycles it can make in one second. So, 1GHz CPU can make 1000 000 000 cycles per second. What is the cycle? Actually it’s the smallest ‘piece’ the cpu can make. It’s however not an instruction. Instruction is, from programmers point of view, the smallest thing he can make on cpu. But one instruction can take few cycles to complete.
Instructions can be separeted into few groups – some are mathematical, like add, substract, some move things in memory, some are more specialised, designed to do specific tasks, for example manipulating strings. In fact x86 is CISC cpu (truly hybrid CISC based on RISC, but it’s getting too complicated here 😉 ), which means that it has complex instructions – few hundread in fact. RISC – reduced instruction set has for example in some microcontroles 30-40 instructions. You can create the same things on RISC and CISC – the difference is just amount of instructions programmer has to use.
OK, so far we know that there are cycles and different instructions. At the lowest level, we can see that some instructions take more cycles then other. What does it mean? Basically, it means, that some instructions execute quicker than other, taking less time to complete.
For example, common practice (which can be seen in compiler-generated assembly) is to use XOR REG,REG instead MOV REG, 0. Both instructions do the same, they zero register (about registers in few seconds), but XOR (it’s a logical operation – exclusive or) uses less cycles.
Let’s go further. I mentioned registers. What are they? Basically, processor to compute something has to have possibility of storing the data somewhere. Register is just memory device, semiconductor one, which is very small (in 32 bit architecture it’s 4 bytes), but it has the shortest access time – it’s basically on-chip memory, which allows processor to access it fastest in the whole PC. Most instructions need the data to be placed in the register to operate on it. So, basically if the data must be placed in the register, it must be copied from somewhere. As you propably know, we have device called memory in the computer. Memory, RAM (Random Access Memory) is, in constrast to the registers, memory which is actually quite big – gigabytes, but has much slower access time than registers. Most data is placed here.
There is also something between RAM and registers – it’s famous thing called cache. Cache is just normal RAM memory (simplyfing the thing a bit) which is placed on-chip. This makes her access time lower than RAM – it’s quick access memory, but again, something for something. Space on chip is limited and so is cache. There are three levels of cache in modern processors, which differ in size and access time (lowest size / lowest access time = L1 (level1) cache, highest size / highest access time = L3 cache).
Basically we want our programs data be as much as possible in cache memory – registers are used for current tasks, while in cache we can have our program and it’s data waiting for execution.
Going one step further, we in fact don’t want to use memory. Well, not really, as it’s not possible, but limiting memory accesses allows us to have greater speed, as each time we wait for memory data, processor does nothing, hangs. If you saw charts from processors from few years ago, you would see that waiting time can be like 100 times higher than execution time. When producers were increasing GHz in processors, they just were shortening execution time, while wait-for-memory-access time was constant. This is way they created multicore cpus – let’s do something in wait-time.
On the other hand though, we want to store as much as possible in memory. Why? In some cases computing things takes much time. If we are to compute something many times, it’s in 99% cases better to compute it once and than store it in memory – for sure we will get better performance.
Going a bit further, memory is also limited – we have to use hard disk too, which is very, very slow. So we cannot store everything in memory – we would simply use it all.
These were just the basics. This simplified model of our architecture will be covered in bigger detail in some later posts. For now, you should see that there is basically one rule: more you have in memory = quicker the program executes (as it doesn’t have to compute things many times), however stupid usage of memory is also a bad thing. On the other hand, if we want to use lower amount of memory, we get worse speed – I think you start seeing our, programmers, dillemmas 🙂 I will leave you with it for today – think about it. Good night