In concurrent programs, programmers will pay special attention to data synchronization between different processes or threads. Especially when multiple threads modify the same variable at the same time, reliable synchronization or other measures must be taken to ensure that the data is modified correctly. An important point here is The principle is: don't assume the order in which instructions are executed. You cannot predict the order in which instructions between different threads will be executed.
But in a single-threaded program, it is usually easy for us to assume that instructions are executed sequentially, otherwise we can imagine what terrible changes will happen to the program. The ideal model is: the order in which various instructions are executed is unique and ordered. This order is the order in which they are written in the code, regardless of the processor or other factors. This model is called the sequential consistency model, and it is Model based on the von Neumann system. Of course, this assumption is reasonable in itself and rarely occurs abnormally in practice, but in fact, no modern multiprocessor architecture adopts this model because it is simply too inefficient. In compilation optimization and CPU pipeline, almost all involve instruction reordering.
compile time reordering
A typical compile-time reordering is to adjust the order of instructions to reduce the number of register reads and stores as much as possible without changing the program semantics, and to fully reuse the stored values of the registers.
Suppose the first instruction calculates a value and assigns it to variable A and stores it in a register. The second instruction has nothing to do with A but needs to occupy a register (assuming it will occupy the register where A is located). The third instruction uses the value of A and Has nothing to do with the second instruction. Then if according to the sequential consistency model, A is put into the register after the first instruction is executed, A no longer exists when the second instruction is executed, and A is read into the register again when the third instruction is executed, and during this process , the value of A has not changed. Usually the compiler will swap the positions of the second and third instructions, so that A exists in the register at the end of the first instruction, and then the value of A can be read directly from the register, reducing the overhead of repeated reading.
The significance of reordering for the pipeline
Modern CPUs almost all use the pipeline mechanism to speed up the processing of instructions. Generally speaking, an instruction requires several CPU clock cycles to process, and through parallel execution of the pipeline, several instructions can be executed in the same clock cycle. The specific method is simply stated. Just divide the instructions into different The execution cycle, such as reading, addressing, parsing, execution and other steps, are processed in different components. At the same time, in the execution unit EU, the functional unit is divided into different components, such as addition components, multiplication components, and loading components. , storage elements, etc., can further realize parallel execution of different calculations.
The pipeline architecture dictates that instructions should be executed in parallel, not as considered in the sequential model. Reordering is conducive to making full use of the pipeline, thereby achieving superscalar effects.
Ensure orderliness
Although instructions are not necessarily executed in the order we wrote them, there is no doubt that in a single-threaded environment, the final effect of instruction execution should be consistent with its effect in sequential execution, otherwise this optimization will be lost significance.
Usually, the above principles will be satisfied whether the instruction reordering is performed at compile time or run time.
Reordering in Java storage model
In the Java Memory Model (JMM), reordering is a very important section, especially in concurrent programming. JMM ensures sequential execution semantics through the happens-before rule. If you want the thread performing operation B to observe the results of the thread performing operation A, then A and B must satisfy the happens-before principle. Otherwise, the JVM can perform arbitrary operations on them. Sorting to improve program performance.
The volatile keyword can ensure the visibility of variables, because operations on volatile are all in Main Memory, and Main Memory is shared by all threads. The price here is that performance is sacrificed, and registers or Cache cannot be used because they are neither Global, visibility cannot be guaranteed and dirty reads may occur.
Another function of volatile is to locally prevent reordering. Operation instructions on volatile variables will not be reordered, because if reordered, visibility problems may occur.
In terms of ensuring visibility, locks (including explicit locks, object locks) and reading and writing of atomic variables can ensure the visibility of variables. However, the implementation methods are slightly different. For example, synchronization lock ensures that data is re-read from the memory to refresh the cache when the lock is obtained. When the lock is released, the data is written back to the memory to ensure that the data is visible, while volatile variables simply read and write memory.