Analysis of Java's GC garbage collection mechanism from the perspective of JVM's memory management

Author：Eve Cole Update Time：2025-02-11 03:48:01

An excellent Java programmer must understand how GC works, how to optimize GC's performance, and how to have limited interaction with GC, because some applications have high performance requirements, such as embedded systems, real-time systems, etc., which can only be comprehensively improved. Only by managing memory efficiency can the performance of the entire application be improved. This article first briefly introduces the working principle of GC, then conducts in-depth discussions on several key issues of GC, and finally puts forward some Java programming suggestions to improve the performance of Java programs from the GC perspective.
The basic principles of GC
Java's memory management is actually the management of objects, including the allocation and release of objects.
For programmers, the new keyword is used to allocate objects; when releasing the object, as long as all references to the object are assigned to null, so that the program can no longer access the object. We call the object /"unreachable/".GC Will be responsible for reclaiming the memory space of all /"unreachable/"objects.
For GC, when a programmer creates an object, GC starts to monitor the address, size and usage of the object. Generally, GC uses directed graphs to record and manage all objects in the heap (heap) (see Reference 1 for details). In this way, which objects are /"reachable/" and which objects are /"unreachable/". When GC determines that some objects are /"unreachable/", GC is responsible for reclaiming these memory spaces. However, in order to ensure that GC can be implemented on different platforms, Java specifications do not strictly stipulate many behaviors of GC. For example, there are no clear regulations on important issues such as what type of recycling algorithm to use and when to perform recycling. Therefore, different JVM implementers often have different implementation algorithms. This also brings uncertainty to Java programmers' development. This article examines several issues related to GC work and strives to reduce the negative impact of this uncertainty on Java programs.
Incremental GC (Incremental GC)
GC is usually implemented in JVM by one or a group of processes. It itself occupies heap space like user programs and also occupies CPU when running. When the GC process is running, the application stops running. Therefore, when the GC runs for a long time, the user can feel the pause of the Java program. On the other hand, if the GC runs for a long time, the object recycling rate may be too low, which means that there are still many objects that should be recycled that have not been recycled. , still takes up a lot of memory. Therefore, when designing GC, it is necessary to weigh the pause time and recovery rate. A good GC implementation allows users to define the settings they need. For example, some devices with limited memory are very sensitive to memory usage. I hope that GC can accurately recycle memory, but it does not care about the slowdown of the program. Other real-time online games cannot allow long-term interruptions to the program. Incremental GC is to divide a long-term interrupt into many small interrupts through a certain recycling algorithm, thereby reducing the impact of GC on user programs. Although incremental GC may not be as efficient as a regular GC in overall performance, it can reduce the maximum pause time of the program.
The HotSpot JVM provided by Sun JDK can support incremental GC. The default GC method of HotSpot JVM is not used. In order to start incremental GC, we must add the -Xincgc parameter when running Java programs. The implementation of HotSpot JVM incremental GC is to use the Train GC algorithm. Its basic idea is to group all objects in the heap (hierarchically) according to creation and usage, put objects with frequent and highly relevant usage in a queue, and keep grouping as the program runs Adjustment. When GC runs, it always recycles the oldest (seldom visited recently) objects first, and if the entire group is recyclable objects, GC recycles the entire group. In this way, each GC run will only recycle a certain proportion of unreachable objects to ensure the smooth operation of the program.
Detailed explanation of finalize function
finalize is a method located in the Object class. The access modifier of this method is protected. Since all classes are subclasses of Object, it is easy for user classes to access this method. Since the finalize function does not automatically implement chain calls, we must implement them manually, so the last statement of the finalize function is usually super.finalize(). In this way, we can implement the call to implement finalize from bottom to top, that is, first release our own resources, and then release the resources of the parent class.
According to the Java language specification, the JVM ensures that this object is unreachable before calling the finalize function, but the JVM does not guarantee that this function will be called. In addition, the specification also ensures that the finalize function runs at most once.
Many Java beginners will think that this method is similar to the destructor in C++, and puts the release of many objects and resources in this function. Actually, this is not a good way. There are three reasons. First, in order to support the finalize function, GC needs to do a lot of additional work to the objects covering this function. Secondly, after the finalize run is completed, the object may become reachable, and GC needs to check again whether the object is reachable. Therefore, using finalize will reduce the performance of GC. Third, since the time when GC calls finalize is uncertain, it is also uncertain to release resources in this way.
Generally, finalize is used to release some resources that are not easy to control and are very important, such as some I/O operations and data connections. The release of these resources is very critical to the entire application. In this case, programmers should mainly manage these resources through the program itself (including releasing) and supplement the method of releasing resources with the finalize function as a supplement to form a dual insurance management mechanism, rather than relying solely on finalize to release resources .
Here is an example to illustrate that after the finalize function is called, it may still be reachable. It can also be explained that the finalize of an object can only be run once.

 class MyObject{ Test main; //Record the Test object, used to restore accessibility when in finalize public MyObject(Test t) { main=t; //Save Test object} protected void finalize() { main.ref= This ;// Restore this object so that this object can reach System.out.println(/"This is finalize/");// Used to test finalize to run only once} } class Test { MyObject ref; public static void main(String [] args) { Test test=new Test(); test.ref=new MyObject(test); test.ref=null; //MyObject object is an unreachable object, finalize will be called System.gc(); if ( test.ref!=null) System.out.println(/"My Object is still alive/"); } }

Running results:

 This is finalize

MyObject is still alive

In this example, it should be noted that although the MyObject object becomes a reachable object in finalize, the next time you recycle, finalize will no longer be called because the finalize function is only called once at most.

How the program interacts with GC
Java2 enhances memory management functions and adds a java.lang.ref package, which defines three reference classes. These three reference classes are SoftReference, WeakReference and PhantomReference. By using these reference classes, programmers can interact with GC to a certain extent in order to improve the efficiency of GC. The reference strengths of these reference classes are between reachable and unreachable objects.
It is also very easy to create a reference object. For example, if you need to create a Soft Reference object, first create an object and use a normal reference method (reachable object); then create a SoftReference to refer to the object; finally set the normal reference to null. In this way, the object has only one Soft Reference reference. At the same time, we call this object a Soft Reference object.
The main feature of Soft Reference is that it has strong citation functions. This type of memory is recycled only when there is insufficient memory, so when there is sufficient memory, they are usually not recycled. In addition, these reference objects can also be set to null before Java throws OutOfMemory exception. It can be used to implement the cache of some commonly used images, implement the function of Cache, and ensure the maximum use of memory without causing OutOfMemory. The following is the following Use pseudocode for this reference type;

 //Application an image object Image image=new Image();//Create Image object... //Use image... //After using image, set it to the soft reference type, and release strong reference; SoftReference sr=new SoftReference( image); image=null; … //The next time you use it if (sr!=null) image=sr.get(); else{ //The image has been released due to low memory, so it needs to be reloaded; image= new Image(); sr=new SoftReference(image); }

The biggest difference between Weak reference objects and Soft reference objects is that when GC recycles, it needs to use an algorithm to check whether the Soft reference objects are recycled, while GC always recycles Weak reference objects. Weak reference objects are easier and faster to be recycled by GC. Although GC must recycle Weak objects when running, a group of Weak object with complex relationships often requires several GC runs to complete. Weak reference objects are often used in Map structures to refer to objects with large data volumes. Once the strong reference of the object is null, GC can quickly recycle the object space.
Phantom references are less useful and are mainly used to assist in the use of finalize functions. Phantom objects refer to some objects that have completed the finalize function and are unreachable objects, but they have not been recycled by GC. This kind of object can assist finalize in some later recycling work. We enhance the flexibility of the resource recycling mechanism by overwriting the Reference clear() method.
Some Java coding suggestions
According to the working principle of GC, we can use some skills and methods to make GC run more efficiently and more in line with the requirements of the application. Here are some suggestions for programming.
1. The most basic suggestion is to release the references of useless objects as soon as possible. When most programmers use temporary variables, they will automatically set the reference variable to null after exiting the active domain (scope). When we use this method, we must pay special attention to some complex object graphs, such as arrays. Queues, trees, graphs, etc., have a complex relationship between these objects. For such objects, GC recycles them generally are less efficient. If the program allows it, assign the unused reference object to null as soon as possible. This can speed up the work of GC. [Page]
2. Try to use the finalize function as little as possible. The finalize function is an opportunity for Java to provide programmers with an opportunity to release objects or resources. However, it will increase the workload of GC, so try to use finalize as little as possible to recycle resources.
3. If you need to use frequently used pictures, you can use the soft application type. It can save the image in memory as much as possible for the program to call without causing OutOfMemory.
4. Pay attention to the collection data types, including data structures such as arrays, trees, graphs, and linked lists. These data structures are more complex for GC to recycle. Also, pay attention to some global variables, as well as some static variables. These variables tend to easily cause dangling references, causing memory waste.
5. When the program has a certain waiting time, the programmer can manually execute System.gc() to notify GC to run, but the Java language specification does not guarantee that GC will execute. Using incremental GC can shorten the pause time of Java programs.