Why data synchronization is needed in java multi-thread programming

Author：Eve Cole Update Time：2025-01-12 09:00:03

Variables in Java are divided into two categories: local variables and class variables. Local variables refer to variables defined within a method, such as variables defined in the run method. For these variables, there is no problem of sharing between threads. Therefore, they do not require data synchronization. Class variables are variables defined in a class, and their scope is the entire class. Such variables can be shared by multiple threads. Therefore, we need to perform data synchronization on such variables.

Data synchronization means that only one thread can access synchronized class variables at the same time. After the current thread has accessed these variables, other threads can continue to access them. The access mentioned here refers to access with write operations. If all threads accessing class variables are read operations, data synchronization is generally not required. So what happens if data synchronization is not performed on shared class variables? Let's first see what happens with the following code:

Copy the code code as follows:

package test;

public class MyThread extends Thread

{

public static int n = 0;

public void run()

{

int m = n;

yield();

m++;

n = m;

}

public static void main(String[] args) throws Exception

{

MyThread myThread = new MyThread ();

Thread threads[] = new Thread[100];

for (int i = 0; i < threads.length; i++)

threads[i] = new Thread(myThread);

for (int i = 0; i < threads.length; i++)

threads[i].start();

for (int i = 0; i < threads.length; i++)

threads[i].join();

System.out.println("n = " + MyThread.n);

}

The possible results of executing the above code are as follows:

Copy the code code as follows:

n=59

Many readers may be surprised to see this result. This program obviously starts 100 threads, and then each thread increments the static variable n by 1. Finally, use the join method to make all 100 threads run, and then output the n value. Normally, the result should be n = 100. But the result is less than 100.

In fact, the culprit of this result is the "dirty data" we often mention. The yield() statement in the run method is the initiator of "dirty data" (without adding a yield statement, "dirty data" may also be generated, but it will not be so obvious. Only by changing 100 to a larger number will it often occur. Generating "dirty data", calling yield in this example is to amplify the effect of "dirty data"). The function of the yield method is to pause the thread, that is, to make the thread calling the yield method temporarily give up CPU resources, giving the CPU a chance to execute other threads. To illustrate how this program generates "dirty data", let's assume that only two threads are created: thread1 and thread2. Since the start method of thread1 is called first, the run method of thread1 will generally run first. When the run method of thread1 runs to the first line (int m = n;), the value of n is assigned to m. When the yield method of the second line is executed, thread1 will temporarily stop executing. When thread1 is paused, thread2 starts running after obtaining the CPU resources (thread2 has been in the ready state before). When thread2 executes the first line (int When m = n;), since n is still 0 when thread1 is executed to yield, the value obtained by m in thread2 is also 0. This causes the m values of thread1 and thread2 to both get 0. After they execute the yield method, they all start from 0 and add 1. Therefore, no matter who executes it first, the final value of n is 1, but this n is assigned a value by thread1 and thread2 respectively. Someone may ask, if there is only n++, will "dirty data" be generated? The answer is yes. So n++ is just a statement, so how to hand over the CPU to other threads during execution? In fact, this is just a superficial phenomenon. After n++ is compiled into an intermediate language (also called bytecode) by the Java compiler, it is not a language. Let's see what Java intermediate language the following Java code will be compiled into.

Copy the code code as follows:

public void run()

{

n++;

}

Compiled intermediate language code

Copy the code code as follows:

public void run()

{

aload_0

dup

getfield

iconst_1

iadd

putfield

return

}

You can see that there is only n++ statement in the run method, but after compilation, there are 7 intermediate language statements. We don't need to know what the functions of these statements are, just look at the statements on lines 005, 007, and 008. Line 005 is getfield. According to its English meaning, we know that we want to get a certain value. Because there is only one n here, there is no doubt that we want to get the value of n. It is not difficult to guess that the iadd in line 007 is to add 1 to the obtained n value. I think you may have guessed the meaning of putfield in line 008. It is responsible for updating the n after adding 1 back to the class variable n. Speaking of this, you may still have a doubt. When executing n++, it is enough to just add n by 1. Why does it take so much trouble? In fact, this involves a Java memory model issue.

Java's memory model is divided into main storage area and working storage area. The main storage area stores all instances in Java. That is to say, after we use new to create an object, the object and its internal methods, variables, etc. are saved in this area, and n in the MyThread class is saved in this area. Main storage can be shared by all threads. The working storage area is the thread stack we talked about earlier. In this area, variables defined in the run method and the method called by the run method are stored, that is, method variables. When a thread wants to modify variables in the main storage area, it does not modify these variables directly, but copies them to the working storage area of the current thread. After the modification is completed, the variable value is overwritten with the corresponding value in the main storage area. variable value.

After understanding Java's memory model, it is not difficult to understand why n++ is not an atomic operation. It must go through a process of copying, adding 1 and overwriting. This process is similar to the one simulated in the MyThread class. As you can imagine, if thread1 is interrupted for some reason when getfield is executed, a situation similar to the execution result of the MyThread class will occur. To completely solve this problem, we must use some method to synchronize n, that is, only one thread can operate n at the same time, which is also called an atomic operation on n.