Java Virtual Machine JVM Performance Optimization (1): Summary of JVM Knowledge

Author：Eve Cole Update Time：2025-02-07 03:24:01

Java applications run on JVM, but do you know about JVM technology? This article (the first part of this series) tells how the classic Java virtual machine works, such as: the pros and cons of Java write-once, cross-platform engines, garbage collection basics, classic GC algorithms and compilation optimization. Subsequent articles will talk about JVM performance optimization, including the latest JVM design - supporting the performance and scalability of today's highly concurrent Java applications.

If you are a developer, you must have encountered this special feeling, you suddenly have a flash of inspiration, all your ideas are connected, and you can recall your previous ideas from a new perspective. I personally love the feeling of learning new knowledge. I've had this experience many times while working with JVM technology, especially with garbage collection and JVM performance optimization. In this new world of Java, I hope to share these inspirations with you. I hope you are as excited to learn about the performance of the JVM as I am writing this article.

This series of articles is written for all Java developers who are interested in learning more about the underlying knowledge of the JVM and what the JVM actually does. At a high level, I'll discuss garbage collection and the endless pursuit of free memory safety and speed without affecting application operation. You will learn the key parts of the JVM: garbage collection and GC algorithms, compilation optimization, and some commonly used optimizations. I'll also discuss why Java markup is so difficult and provide advice on when you should consider testing for performance. Finally, I will talk about some new innovations in JVM and GC, including Azul's Zing JVM, IBM JVM, and Oracle's Garbage First (G1) garbage collection focus.

I hope you finish reading this series with a deeper understanding of the nature of Java's scalability constraints and how these constraints force us to create a Java deployment in an optimal way. Hopefully you'll have a sense of enlightenment and some good Java inspiration: stop accepting those limitations and change them! If you are not an open source worker yet, this series may encourage you to develop in this area.

JVM performance and the “compile once, run anywhere” challenge

I have new news for those stubborn believers that the Java platform is inherently slow. When Java first became an enterprise-level application, the Java performance issues that the JVM was criticized for were already more than ten years ago, but this conclusion is now outdated. It's true that if you run simple static and deterministic tasks on different development platforms today, you will most likely find that using machine-optimized code will perform better than using any virtual environment, under the same JVM. However, Java's performance has improved greatly in the past 10 years. Market demand and growth in the Java industry have resulted in a handful of garbage collection algorithms, new compilation innovations, and a host of heuristics and optimizations that have advanced JVM technology. I'll cover some of these in future chapters.

The technical beauty of the JVM is also its biggest challenge: nothing can be considered a "compile once, run anywhere" application. Rather than optimizing for one use case, one application, or one specific user load, the JVM continuously tracks what the Java application is currently doing and optimizes accordingly. This dynamic operation leads to a series of dynamic problems. Developers working on the JVM don't rely on static compilation and predictable allocation rates when designing innovations (at least not when we demand performance in production environments).

The cause of JVM performance

In my early work I realized that garbage collection was very difficult to "solve", and I have always been fascinated by JVMs and middleware technology. My passion for JVMs started when I was on the JRockit team, coding a new way to teach myself and debug garbage collection algorithms myself (see Resources). This project (which turned into an experimental feature of JRockit and became the basis for the Deterministic Garbage Collection algorithm) started my journey into JVM technology. I have worked at BEA Systems, Intel, Sun, and Oracle (because Oracle acquired BEA Systems, I worked for Oracle briefly). Then I joined the team at Azul Systems to manage the Zing JVM, and now I work for Cloudera.

Machine-optimized code may achieve better performance (but at the expense of flexibility), but this is not a reason to weigh it for enterprise applications with dynamic loading and rapidly changing functionality. For the advantages of Java, most companies are more willing to sacrifice the barely perfect performance brought by machine-optimized code.

1. Easy to code and function development (meaning shorter time to respond to the market)
2. Get knowledgeable programmers
3. Use Java APIs and standard libraries for faster development
4. Portability - no need to rewrite Java applications for new platforms

From Java code to bytecode

As a Java programmer, you are probably familiar with coding, compiling, and executing Java applications. Example: Let's assume you have a program (MyApp.java) and now you want it to run. To execute this program you need to first compile it with javac (the static Java language to bytecode compiler built into the JDK). Based on the Java code, javac generates the corresponding executable bytecode and saves it in the class file with the same name: MyApp.class. After compiling the Java code into bytecode, you can start the executable class file through the java command (through the command line or startup script, without using the startup option) to run your application. In this way, your class is loaded into the runtime (meaning the running of the Java virtual machine), and the program starts executing.

This is what every application executes on the surface, but now let’s explore what exactly happens when you execute a java command. What is a Java virtual machine? Most developers interact with the JVM through continuous debugging - aka selecting and value-assigning startup options to make your Java programs run faster while avoiding the infamous "out of memory" errors. But, have you ever wondered why we need a JVM to run Java applications in the first place?

What is a Java virtual machine?

Simply put, a JVM is a software module that executes Java application bytecode and converts the bytecode into hardware and operating system-specific instructions. By doing this, the JVM allows a Java program to be executed in a different environment after it was first written, without requiring changes to the original code. Java's portability is the key to an enterprise application language: developers don't need to rewrite application code for different platforms because the JVM takes care of translation and platform optimization.

A JVM is basically a virtual execution environment that acts as a bytecode instruction machine and is used to allocate execution tasks and perform memory operations by interacting with the underlying layer.

A JVM also takes care of dynamic resource management for running Java applications. This means that it masters allocating and freeing memory, maintains a consistent threading model on each platform, and organizes executable instructions where the application is executed in a way that is suitable for the CPU architecture. The JVM frees developers from keeping track of references to objects and how long they need to exist in the system. Likewise, it doesn't require us to manage when to release memory - a pain point in non-dynamic languages like C.

You can think of the JVM as an operating system specifically designed to run Java; its job is to manage the running environment for Java applications. A JVM is basically a virtual execution environment that interacts with the underlying environment as a bytecode instruction machine for allocating execution tasks and performing memory operations.

JVM component overview

There are many articles written about JVM internals and performance optimization. As the basis of this series, I will summarize and overview the JVM components. This brief overview is particularly useful for developers who are new to the JVM and will make you want to learn more about the more in-depth discussions that follow.

From One Language to Another - About Java Compilers

A compiler takes one language as input and then outputs another executable statement. The Java compiler has two main tasks:

1. Make the Java language more portable and no longer need to be fixed on a specific platform when writing for the first time;

2. Ensure that valid executable code is produced for a specific platform.

Compilers can be static or dynamic. An example of static compilation is javac. It takes Java code as input and converts it into bytecode (a language executed in the Java virtual machine). The static compiler interprets the input code once and outputs an executable form, which will be used when the program is executed. Because the input is static, you will always see the same result. Only if you modify the original code and recompile will you see different output.

Dynamic compilers , such as Just-In-Time (JIT) compilers, convert one language to another dynamically, which means they do this while the code is being executed. The JIT compiler lets you collect or create runtime analytics (by inserting performance counts), using the compiler's decisions, using the environment data at hand. A dynamic compiler can implement better instruction sequences during the process of compiling into a language, replace a series of instructions with more efficient ones, and even eliminate redundant operations. Over time you will collect more code configuration data and make more and better compilation decisions; the entire process is what we usually call code optimization and recompilation.

Dynamic compilation gives you the advantage of adapting to dynamic changes based on behavior, or new optimizations as the number of application loads increases. This is why dynamic compilers are perfect for Java operations. It is worth noting that the dynamic compiler requests external data structures, thread resources, CPU cycle analysis and optimization. The deeper the optimization, the more resources you will need. In most environments, however, the top layer adds very little to performance - 5 to 10 times faster performance than your pure interpretation.

Allocation causes garbage collection

Allocated in each thread based on each "Java process allocated memory address space", or called Java heap, or directly called heap. In the Java world single-threaded allocation is common in client applications. However, single-threaded allocation is not beneficial in enterprise applications and workload servers because it does not take advantage of the parallelism of today's multi-core environments.

Parallel application design also forces the JVM to ensure that multiple threads do not allocate the same address space at the same time. You can control this by placing a lock on the entire allocated space. But this technique (often called heap locking) is very performance-intensive, and holding or queuing threads can affect resource utilization and application optimization performance. The good thing about multi-core systems is that they create a need for a variety of new methods to prevent single-threaded bottlenecks while allocating resources, and serialization.

A common approach is to divide the heap into parts, where each partition is a reasonable size for the application - obviously they need to be tuned, allocation rates and object sizes vary significantly between applications, and the number of threads for the same Also different. The Thread Local Allocation Buffer (TLAB), or sometimes the Thread Local Area (TLA), is a specialized partition in which threads can freely allocate without declaring a full heap lock. . When the area is full, the heap is full, which means that there is not enough free space on the heap to place objects, and space needs to be allocated. When the heap is full, garbage collection will begin.

fragments

Using TLABs to catch exceptions fragments the heap to reduce memory efficiency. If an application happens to be unable to increase or fully allocate a TLAB space when allocating objects, there is a risk that the space will be too small to generate new objects. Such free space is considered "fragmentation". If the application keeps a reference to the object and then allocates the remaining space, eventually the space will be free for a long time.

Fragmentation is when fragments are scattered across the heap - wasting heap space through small sections of unused memory space. Allocating "wrong" TLAB space for your application (regarding object size, mixed object size, and reference holding ratio) is the cause of increased heap fragmentation. As the application runs, the number of fragments increases and takes up space in the heap. Fragmentation causes performance degradation and the system cannot allocate enough threads and objects to new applications. The garbage collector will then have difficulty preventing out-of-memory exceptions.

TLAB waste is generated on the job. One way to avoid fragmentation completely or temporarily is to optimize TLAB space on every underlying operation. A typical approach to this approach is that as long as the application has allocation behavior, it needs to be retuned. This can be achieved through complex JVM algorithms. Another method is to organize heap partitions to achieve more efficient memory allocation. For example, the JVM can implement free-lists, which are linked together as a list of free memory blocks of a specific size. A contiguous free memory block is connected to another contiguous memory block of the same size, thus creating a small number of linked lists, each with its own boundaries. In some cases free-lists result in better memory allocation. Threads can allocate objects in blocks of similar size, potentially creating less fragmentation than if you just relied on fixed-size TLABs.

GC trivia

Some early garbage collectors had multiple old generations, but having more than two old generations would cause the overhead to outweigh the value. Another way to optimize allocations and reduce fragmentation is to create what is called the young generation, which is a dedicated heap space dedicated to allocating new objects. The remaining heap becomes the so-called old generation. The old generation is used to allocate long-lived objects. Objects that are assumed to exist for a long time include objects that are not garbage collected or large objects. In order to better understand this allocation method, we need to talk about some knowledge of garbage collection.

Garbage collection and application performance

Garbage collection is the JVM's garbage collector to release occupied heap memory that is not referenced. When garbage collection is triggered for the first time, all object references are still retained, and the space occupied by previous references is released or reallocated. After all reclaimable memory has been collected, the space waits to be grabbed and allocated again to new objects.

The garbage collector can never redeclare a reference object, doing so would break the JVM standard specification. The exception to this rule is a soft or weak reference that can be caught if the garbage collector is about to run out of memory. I strongly recommend that you try to avoid weak references, however, because the ambiguity of the Java specification leads to misinterpretations and usage errors. What's more, Java is designed for dynamic memory management, because you don't need to think about when and where to release memory.

One of the challenges of the garbage collector is to allocate memory in a way that does not affect running applications. If you don't garbage collect as much as possible, your application will consume memory; if you collect too often, you will lose throughput and response time, which will have a bad impact on the running application.

GC algorithm

There are many different garbage collection algorithms. Several points will be discussed in depth later in this series. At the highest level, the two main methods of garbage collection are reference counting and tracking collectors.

The reference counting collector keeps track of how many references an object points to. When an object's reference reaches 0, the memory will be reclaimed immediately, which is one of the advantages of this approach. The difficulty with the reference counting approach lies in the circular data structure and keeping all references updated in real time.

The tracking collector marks objects that are still being referenced, and uses the marked objects to repeatedly follow and mark all referenced objects. When all objects that are still referenced are marked as "live", all unmarked space will be reclaimed. This approach manages ring data structures, but in many cases the collector should wait until all marking is complete before reclaiming unreferenced memory.

There are various ways to do the above method. The most famous algorithms are marking or copying algorithms, parallel or concurrent algorithms. I will discuss these in a later article.

Generally speaking, the meaning of garbage collection is to allocate address space to new and old objects in the heap. "Old objects" are objects that have survived many garbage collections. Use the new generation to allocate new objects and the old generation to old objects. This can reduce fragmentation by quickly recycling short-lived objects that occupy memory. It also aggregates long-lived objects together and places them at old generation addresses. in space. All this reduces fragmentation between long-lived objects and saving heap memory from fragmentation. A positive effect of the new generation is that it delays the more expensive collection of old generation objects, and you can reuse the same space for ephemeral objects. (Collection of old space will cost more because long-lived objects will contain more references and require more traversals.)

The last algorithm worth mentioning is compaction, which is a method of managing memory fragmentation. Compaction basically moves objects together to release larger contiguous memory space. If you are familiar with disk fragmentation and tools that deal with it, you will find that compaction is very similar to it, except that this one runs in Java heap memory. I will discuss compaction in detail later in the series.

Summary: Review and Highlights

The JVM allows for portability (program once, run anywhere) and dynamic memory management, all key features of the Java platform that contribute to its popularity and increased productivity.

In the first article on JVM performance optimization systems, I explained how a compiler converts bytecode into the instruction language of the target platform and helps dynamically optimize the execution of Java programs. Different applications require different compilers.

I also briefly covered memory allocation and garbage collection, and how these relate to Java application performance. Basically, the faster you fill up the heap and trigger garbage collection more frequently, the higher the utilization rate of your Java application. One challenge for the garbage collector is to allocate memory in a way that does not affect the running application, but before the application runs out of memory. In future articles we will discuss traditional and new garbage collection and JVM performance optimizations in more detail.