Friends who study Java should all know that Java has been using the banner of platform independence since the beginning, saying "write once, run anywhere". In fact, when it comes to irrelevance, the Java platform has another irrelevance, which is language independence. , to achieve language independence, then the file structure of the class in the Java system or It is very important to say that it is bytecode. In fact, Java has had two sets of specifications since the beginning, one is the Java language specification, and the other is the Java virtual machine specification. The Java language specification only stipulates the constraints related to the Java language and Rules, and virtual machine specifications are truly designed from a cross-platform perspective. Today we will take a practical example to see what the bytecode corresponding to a Class file in Java should look like. This article will first explain in general what content Class consists of, and then use an actual Java class to analyze the file structure of class.
Before continuing, we first need to clarify the following points:
1) Class files are composed of 8-byte-based byte streams. These byte streams are strictly arranged in the specified order, and there are no gaps between bytes. For files exceeding 8 bytes, Data will be stored in Big-Endian order, that is to say, the high-order byte is stored at the low address, and the low-order byte is stored at the high address. In fact, This is also the key to cross-platform class files, because the PowerPC architecture uses Big-Endian storage order, while the x86 series processors use Little-Endian storage order, so in order for Class files to be maintained under each processor architecture Unified storage order, virtual machine specifications must be unified.
2) The Class file structure uses a C language-like structure to store data. There are two main types of data items, unsigned numbers and tables. Unsigned numbers are used to express numbers, index references, and strings, such as u1, u2, u4 and u8 represent 1 byte, 2 bytes, 4 bytes, and 8 bytes of unsigned numbers respectively, and the table is a composite structure composed of multiple unsigned numbers and other tables. Maybe everyone here is not very clear about what unsigned numbers and tables are, but it doesn't matter. I will explain it with examples when I give the examples below.
After clarifying the above two points, let's take a look at the specific data contained in the byte stream arranged in strict order in the Class file:
When looking at the picture above, there is one thing we need to pay attention to, such as cp_info, cp_info represents the constant pool. In the picture above, constant_pool[constant_pool_count-1] is used to represent the constant pool with constant_pool_co unt-1 constant, it is expressed in the form of an array here, but don’t mistakenly think that the constant lengths of all constant pools are the same. In fact, this place uses the array method just for convenience of description, but it is not like this. In programming languages, an array of int type has the same length of each int. After clarifying this point, let's go back and see what each item in the above picture specifically represents.
1) u4 magic represents the magic number, and the magic number occupies 4 bytes. What does the magic number do? It actually means that the file type is a Class file, not a JPG picture or AVI movie. The magic number corresponding to the Class file is 0xCAFEBABE.
2) u2 minor_version represents the minor version number of the Class file, and this version number is an unsigned number representation of the u2 type.
3) u2 major_version represents the major version number of the Class file, and the major version number is an unsigned number representation of the u2 type. major_version and minor_version are mainly used to indicate whether the current virtual machine accepts the current version of the Class file. The versions of Class files compiled by different versions of Java compilers are different. A higher version of the virtual machine supports the Class file structure compiled by a lower version of the compiler. For example, the virtual machine corresponding to Java SE 6.0 supports the Class file structure compiled by the Java SE 5.0 compiler, but not vice versa.
4) u2 constant_pool_count represents the number of constant pools. Here we need to focus on what the constant pool is. Please do not confuse it with the runtime constant pool in the Jvm memory model. The constant pool in the Class file mainly stores literals and symbol references, where literals mainly include strings. , the value of the final constant or Or the initial value of a certain attribute, etc., while symbol references mainly store the fully qualified names of classes and interfaces, field names and descriptors, method names and descriptors. The names here may be easy for everyone to understand, as for the concept of descriptors , we will talk about it later when we discuss the field table and method table. In addition, everyone knows that the memory model of Jvm consists of heap, stack, method area, and program counter, and there is an area in the method area called the runtime constant pool. The things stored in the runtime constant pool are actually the immortality of the compiler. Various literals and symbol references, but the runtime constant pool is dynamic. It can add other constants to it at runtime. The most representative one is the intern method of String.
5) cp_info represents the constant pool, which contains the various literals and symbol references mentioned above. There are a total of 14 data items placed in the constant pool in The Java Virtual Machine Specification Java SE 7 Edition. Each constant is a table, and each constant uses a common partial tag to indicate what type it is. type constant.
The specific details are briefly described below and we will refine them in later examples.
The CONSTANT_Utf8_info tag flag is 1, the UTF-8 encoded string CONSTANT_Integer_info tag flag is 3, the integer literal CONSTANT_Float_info tag flag is 4, the floating point literal CONSTANT_Long_info tag flag is 5, the long integer literal CONSTANT_Double_info tag flag The bit is 6, the double precision literal CONSTANT_Class_info tag flag is 7, The symbolic reference CONSTANT_String_info tag of the class or interface is 8, the literal CONSTANT_Fieldref_info tag of the string type is 9, the symbolic reference of the field CONSTANT_Methodref_info tag is 10, the symbolic reference of the method in the class CONSTANT_InterfaceMethodref_info tag is 11, Symbolic reference to the method in the interface CONSTANT_NameAndType_info tag Flag bit 12, names of fields and methods, and symbolic references to types
6) u2 access_flags represents the access information of the class or interface, as shown in the following figure:
7) u2 this_class represents the constant pool index of the class, pointing to the constant of CONSTANT_Class_info in the constant pool
8) u2 super_class represents the index of the super class, pointing to the constant of CONSTANT_Class_info in the constant pool
9) u2 interface_counts represents the number of interfaces
10) u2 interface[interface_counts] represents the interface table, each item in it points to the CONSTANT_Class_info constant in the constant pool
11) u2 fields_count represents the number of instance variables and class variables of the class
12) field_info fields[fields_count] represents the information of the field table, where the structure of the field table is as shown below:
In the above figure, access_flags represents the access representation of the field. For example, the field is public, private, and protect. etc., name_index represents the field name, pointing to the constant of type CONSTANT_UTF8_info in the constant pool, descriptor_index represents the descriptor of the field, which also points to the constant of type CONSTANT_UTF8_info in the constant pool, attributes_count represents the number of attribute tables in the field table, and the attribute table It is an extensible structure used to describe fields, methods and class attributes. Different versions of the Java virtual machine support different number of attribute tables.
13) u2 methods_count represents the number of method tables
14) method_info represents the method table. The specific structure of the method table is as shown in the figure below:
Among them, access_flags represents the access representation of the method, name_index represents the index of the name, descriptor_index represents the descriptor of the method, attributes_count and attribute_info are similar to the attribute tables in the field table, except that the attributes in the attribute table in the field table and method table are different, such as The Code attribute in the method table represents the code of the method, but there is no Code attribute in the field table. How many attributes are there in the specific Class will be discussed later when we look at the attribute table in the Class file structure.
15) attribute_count represents the number of attribute tables. When it comes to attribute tables, we need to clarify the following points:
The attribute table exists at the end of the Class file structure, in the field table, method table and Code attribute. That is to say, the attribute table can also exist in the attribute table. The length of the attribute table is not fixed. Different attributes have different lengths. of
After describing the composition of each item in the Class file structure above, we use a practical example to explain the following content.
Copy the code code as follows:
package com.ejushang.TestClass;
public class TestClass implements Super{
private static final int staticVar = 0;
private int instanceVar=0;
public int instanceMethod(int param){
return param+1;
}
}
interface Super{ }
The binary structure of TestClass.class corresponding to TestClass.java compiled through javac of jdk1.6.0_37 is shown in the figure below:
Next, we will parse the byte stream in the figure above based on the file structure of Class mentioned earlier.
1) Magic number <br/>From the file structure of Class, we know that the first 4 bytes are the magic number. In the picture above, the content from the address 00000000h-00000003h is the magic number. From the picture above, we can know the magic number of the Class file. The number is 0xCAFEBABE.
2) Major and minor version numbers <br/>The next 4 bytes are the major and minor version numbers. From the above figure, we can see that the corresponding numbers from 00000004h-00000005h are 0×0000, so the minor_version of Class is 0×0000, and the corresponding content from 00000006h-00000007h is 0×0032, so the major_version version of the Class file is 0×0032, which is exactly the major and minor version corresponding to the Class compiled by jdk1.6.0 without the target parameter.
3) The number of constant pool <br/>The next 2 bytes represent the number of constant pool from 00000008h-00000009h. From the above figure, we can know that its value is 0×0018, which is 24 in decimal, but for The number of constant pools needs to be clarified. The number of constant pools is constant_pool_count-1. The reason why it is reduced by one is because index 0 means that the data items in the class do not reference any constants in the constant pool.
4) Constant Pool <br/>We said above that there are different types of constants in the constant pool. Let’s take a look at the first constant of TestClass.class. We know that each constant is represented by a u1 type tag identifier. The type of constant, at 0000000ah in the above picture The content is 0x0A, which converted to the secondary system is 10. From the above description of the constant type, it can be seen that the constant with tag 10 is Constant_Methodref_info, and the structure of Constant_Methodref_info is as shown in the figure below:
Among them, class_index points to the constant of type CONSTANT_Class_info in the constant pool. It can be seen from the binary file structure of TestClass that the value of class_index is 0×0004 (the address is 0000000bh-0000000ch), which means it points to the fourth constant.
name_and_type_index points to the constant of type CONSTANT_NameAndType_info in the constant pool. As can be seen from the above figure, the value of name_and_type_index is 0×0013, which means it points to the 19th constant in the constant pool.
Next, you can use the same method to find all the constants in the constant pool. However, JDK provides a convenient tool that allows us to view the constants contained in the constant pool. You can get all the constants in the constant pool through javap -verbose TestClass. The screenshot is as follows:
From the picture above, we can clearly see that there are 24 constants in the constant pool in TestClass. Don't forget the 0th constant, because the 0th constant is used to indicate that the data items in Class do not reference any constants in the constant pool. From the above analysis, we know that the first constant representation method of TestClass is that the fourth constant pointed by class_index is java/lang/Object, and the 19th constant value pointed by name_and_type_index is <init>:()V. From It can be seen here that the first constant representing a method represents the instance constructor method generated by the java compiler. Other constants in the constant pool can be analyzed in the same way. OK, after analyzing the constant pool, let’s analyze access_flags next.
5) u2 access_flags represents access information about classes or interfaces. For example, Class represents whether it is a class or an interface, whether it is public, static, final, etc. The meaning of the specific access flag has been mentioned before. Let's take a look at the access flag of TestClass. The access flag of Class is from 0000010dh-0000010e, and the value is 0×0021. According to the flag bits of various access flags mentioned earlier, we can know: 0×0021=0×0001|0×0020, that is, ACC_PUBLIC and ACC_SUPER are True, ACC_PUBLIC is easy to understand, and ACC_SUPER is a flag that will be carried by classes compiled after jdk1.2.
6) u2 this_class represents the index value of the class, which is used to represent the fully qualified name of the class. The index value of the class is as shown in the figure below:
As can be clearly seen from the above figure, the class index value is 0×0003, which corresponds to the third constant of the constant pool. Through the javap result, we know that the third constant is a constant of the CONSTANT_Class_info type, through which we can know the full details of the class. The qualified name is: com/ejushang/TestClass /TestClass
7) u2 super_class represents the index value of the parent class of the current class. The index value points to a constant of type CONSTANT_Class_info in the constant pool. The index value of the parent class is as shown in the figure below. Its value is 0×0004. Check the constant pool’s first Four constants, it can be seen that the fully qualified name of the parent class of TestClass is: java/lang/Object
8) interfaces_count and interfaces[interfaces_count] represent the number of interfaces and each specific interface. The number of interfaces and interfaces of TestClass are as shown in the figure below, where 0×0001 means that the number of interfaces is 1, and 0×0005 means the index of the interface in the constant pool value, find the fifth constant in the constant pool, its type is CONSTANT_Class_info, and its value is: com/ejushang/TestClass/Super
9) fields_count and field_info , fields_count represents the number of field_info tables in the class, and field_info represents the instance variables and class variables of the class. It should be noted here that field_info does not include fields inherited from the parent class. The structure of field_info is as shown in the figure below:
Among them, access_flags represents the access flag of the field, such as public, private, protected, static, final, etc. The value of access_flags is as shown in the figure below:
Among them, name_index and descriptor_index are both index values of the constant pool, which respectively represent the name of the field and the descriptor of the field. The name of the field is easy to understand, but how to understand the descriptor of the field? In fact, in the JVM specification, the field descriptors are specified as shown in the following figure:
Among them, everyone needs to pay attention to the last line of the picture above. It represents the descriptor of a one-dimensional array. The descriptor for String[][] will be [[ Ljava/lang/String, and the description for int[][] The symbol is [[I. The following attributes_count and attributes_info represent the number of attribute tables and attribute tables respectively. Let's take the above TestClass as an example and take a look at the field table of TestClass.
First, let’s take a look at the number of fields. The number of fields in TestClass is as shown in the figure below:
As can be seen from the picture above, TestClass has two fields. Looking at the source code of TestClass, we can see that there are indeed only two fields. Next, let’s look at the first field. We know that the first field should be private int staticVar, which The binary representation in the Class file is as shown below:
Among them, 0x001A represents the access flag. By looking at the access_flags table, we can know that it is ACC_PRIVATE, ACC_STATIC, ACC_FINAL. Next, 0×0006 and 0×0007 represent the 6th and 7th constants in the constant pool respectively. By looking at the constant pool, we can know that their values They are: staticVar and I, where staticVar is the field name and I is the field descriptor. Through the above explanation of descriptors, what I describes is a variable of type int. Next, 0×0001 represents the number of attribute tables in the staticVar field table. From the above figure, it can be seen that there is 1 attribute table corresponding to the staticVar field. 0×0008 represents the 8th constant in the constant pool. Looking at the constant pool, you can see that this attribute is the ConstantValue attribute, and the format of the ConstantValue attribute is as shown in the figure below:
Among them, attribute_name_index expresses the constant pool index of the attribute name. In this example, it is ConstantValue. The attribute_length of ConstantValue has a fixed length of 2, and constantValue_index represents the reference in the constant pool. In this example, it is 0×0009. You can view the 9th constant. You know, it represents a constant of type CONSTANT_Integer_info whose value is 0.
Having said that private static final int staticVar=0, let’s talk about private int instanceVar=0 of TestClass. In this example, the binary representation of instanceVar is as shown in the figure below:
Among them, 0×0002 means that the access mark is ACC_PRIVATE, 0x000A means the name of the field, which points to the 10th constant in the constant pool. Looking at the constant pool, you can know that the field name is instanceVar, and 0× 0007 represents the descriptor of the field, which points to the 7th constant in the constant pool. Looking at the constant pool, you can know that the 7th constant is I, which represents the type of instanceVar. Finally, 0×0000 represents that the number of attribute tables is 0. .
10) methods_count and methods_info , where methods_count represents the number of methods, and methods_info represents the method table, where the structure of the method table is as shown in the figure below:
As can be seen from the above figure, the structures of method_info and field_info are very similar. All flag bits and values of access_flag in the method table are as shown in the figure below:
Among them, name_index and descriptor_index represent the name and descriptor of the method, and they are indexes pointing to the constant pool respectively. Here we need to explain the method descriptor. The structure of the method descriptor is: (parameter list) return value. For example, the descriptor of public int instanceMethod(int param) is: (I) I, which means it has an int type parameter. And the return value is also a method of type int. Next is the number of attributes and the attribute table. Although the method table and the field table both have the number of attributes and the attribute table, the attributes they contain are different. Next, let's take a look at the binary representation of the method table using TestClass. First, let’s take a look at the number of method tables. The screenshot is as follows:
As can be seen from the above figure, the number of method tables is 0×0002, which means there are two methods. Next, let’s analyze the first method. Let’s first look at the access_flag, name_index, descriptor_index of the first method of TestClass. The screenshot is as follows :
From the above figure, we can know that access_flags is 0×0001. From the above description of the access_flags flag, we can see that the value of access_flags of the method is ACC_PUBLIC, and the name_index is 0x000B. Check the constant pool. The 11th constant, knowing that the name of the method is <init>, 0x000C means descriptor_index means the 12th constant in the constant pool, and its value is ()V, which means the <init> method has no parameters and return value. In fact, this is the compiler Automatically generated instance constructor methods. The next 0×0001 indicates that the method table of the <init> method has 1 attribute. The attribute screenshot is as follows:
As can be seen from the above figure, the constant in the constant pool corresponding to 0x000D is Code, which represents the Code attribute of the method. So here everyone should understand that the codes of the method are stored in the Code attribute in the attribute table in the method table of the Class file. . Next we analyze the Code attribute. The structure of the Code attribute is shown in the figure below:
Among them, attribute_name_index points to the constant whose value is Code in the constant pool, and the length of attribute_length indicates the length of the Code attribute table (it should be noted that the length does not include the 6-byte length of attribute_name_index and attribute_length).
max_stack represents the maximum stack depth. The virtual machine allocates the depth of the operands in the stack frame based on this value at runtime, and max_locals represents the storage space of the local variable table.
The unit of max_locals is slot, which is the smallest unit for the virtual machine to allocate memory for local variables. At runtime, data types that do not exceed 32-bit types, such as byte, char, int, etc., occupy 1 slot, while double and Long A 64-bit data type needs to allocate 2 slots. In addition, the value of max_locals is not the sum of the memory required by all local variables, because slots can be reused. When the local variable exceeds its scope, the local variable The occupied slot will be reused.
code_length represents the number of bytecode instructions, and code represents bytecode instructions. From the above figure, we can know that the type of code is u1. The value of a u1 type is 0×00-0xFF, and the corresponding decimal is 0- 255. Currently, the virtual machine specification has defined more than 200 instructions.
exception_table_length and exception_table respectively represent the exception information corresponding to the method.
attributes_count and attribute_info represent the number of attributes and the attribute table in the Code attribute respectively. It can be seen from here that in the file structure of Class, the attribute table is very flexible. It can exist in the Class file, method table, field table and Code attribute. .
Next, we continue to analyze the above example. From the screenshot of the Code attribute of the init method above, we can see that the length of the attribute table is 0×00000026, the value of max_stack is 0×0002, and the value of max_locals is 0× 0001, the length of code_length is 0x0000000A, then 00000149h- 00000152h is bytecode. Next, the length of exception_table_length is 0×0000, and the value of attribute_count is 0×0001. The value of 00000157h-00000158h is 0x000E, which represents the name of the attribute in the constant pool. Check the constant pool to get the 14th one. The value of the constant is LineNumberTable, LineNu mberTable is used to describe the correspondence between the Java source code line number and the bytecode line number. It is not a required attribute at runtime. If you cancel the generation of this information through the -g:none compiler parameter, the biggest impact will be When an exception occurs, the error line number cannot be displayed on the stack, and breakpoints cannot be set according to the source code during debugging. Next, let's take a look at the structure of LineNumberTable, as shown below:
Among them, attribute_name_index has been mentioned above and represents the index of the constant pool, attribute_length represents the attribute length, and the start_pc and line_number tables represent the line number of the bytecode and the line number of the source code. The byte stream of the LineNumberTable property in this example is as shown below:
After analyzing the first method of TestClass above, we can analyze the second method of TestClass in the same way. The screenshot is as follows:
Among them, access_flags is 0×0001, name_index is 0x000F, and descriptor_index is 0×0010. By looking at the constant pool, you can know that this method is the public int instanceMethod (int param) method. Through a method similar to the above, we can know that the Code attribute of instanceMethod is as shown in the figure below:
Finally, let’s analyze the attributes of the Class file. From 00000191h-00000199h is the attribute table in the Class file, where 0×0011 represents the name of the attribute. Looking at the constant pool, we can know that the attribute name is SourceFile. Let’s take a look at the structure of SourceFile as follows As shown in the figure:
Among them, attribute_length is the length of the attribute, and sourcefile_index points to the constant in the constant pool whose value is the name of the source code file. In this example, the screenshot of the SourceFile attribute is as follows:
Among them, attribute_length is 0×00000002, which means the length is 2 bytes, and the value of sourcefile_index is 0×0012. Looking at the 18th constant in the constant pool, you can know that the name of the source code file is TestClass.java
Finally, I hope friends who are interested in technology can communicate more. Personal Weibo: (http://weibo.com/xmuzyq)