Deep understanding of string types in Java

Author：Eve Cole Update Time：2025-03-03 15:32:02

1. Java built-in support for strings;

The so-called built-in support means that string types are not implemented using char pointers like C, and Java's string encoding complies with Unicode encoding standards, which also means that it does not need to be implemented using string and wstring classes like C++ to be compatible with C language and Unicode standard. Java supports string types through the String class internally.

This means: we can directly call the same method as the String object:

//You can directly call all methods of the String object on "abc"

int length="abc".length();

as well as

String abc=new String("abc");

int length=abc.length();

2. The string value in Java is constant (constant)

What we mean here is that after the string type is created, the value cannot be changed. From the member methods of String, it can also be seen that there is no method interface that can change the value; and like "abc", new String("def") The "abc" and "def" in the constant pool stored in the Java virtual machine.

The "abc" in the following code is stored in the constant pool, so the addresses pointed to by variables a and ab are the same "abc" in the constant pool.

The code copy is as follows:

public class StringTest {

public static void main(String[] args) {

String a="abc";

String ab="abc";

String abc=new String("abc");

System.out.println(ab==a);

System.out.println(a==abc);

}

/*Program output:

* true

* false

* */

So how are dynamically generated and variable strings implemented? Java provides StringBuffer and StringBuilder classes to achieve this requirement; Java string concatenation can use the "+" operator; such as: "abc"+"def"; the internal implementation here can also use the StringBuilder class or StringBuffer class to implement; How are StringBuilder and StringBuffer implemented? It stores strings through a character array. The following is a snippet found from the source code that comes with JDK. It can be seen that StringBuffer uses a char array to store strings internally. The AbstractStringBuilder is the parent class of StringBuffer:

3. Encoding issues in strings.

Two questions to understand here: How to deal with string encoding in source files? What encoding does strings use when compiling into class files or code runs in Java virtual machine?

The first problem is that the string encoding in the source code depends on your IDE or text editor. For example, the following code is edited in GBK encoding format, and then open it using UTF-8 and GBK decoding

//GBK encoding format, open in GBK format

//GBK encoding format, open in UTF-8 format, garbled; if the default encoding format of the system is not GBK at this time, you need to add the "-encoding GBK" parameter option value to javac during compilation;

So how to deal with this kind of source code encoding problem? The answer is specified in the parameter option -encoding of the compiler javac. The default value of this parameter is consistent with the system's default encoding. The default encoding of Windows is generally GBK (this value can be obtained through System.getProperty("file.encoding")); the default encoding of the system is GBK, but the source code is encoded using UTF-8, so you should use javac -encoding UTF- 8 Compile.

What is the encoding of strings when compiling into a class file or code when running a Java virtual machine? The understanding of this problem is: First of all, the String type in Java is implemented using UTF-16 encoding, that is, regardless of How is the source code encoding? Strings in Java virtual machines are implemented using UTF-16 encoding. This means that as long as the compiler javac correctly understands the encoding of strings in the source code file, the strings in the runtime or class bytecode file are independent of the encoding format in the source code. Here we can further understand the basic char type or Character class in Java. The internal encodings of these two are the same as the string types of Java, and are implemented based on UTF-16 encoding, that is, regardless of 'a',' 1'The length of characters or Chinese characters in Java is 16 bits.

In addition, in the String type, we also use the specified fixed character encoding to convert the underlying binary representation and string, which means that we can correctly read GBK encoding, UTF-8 encoding or other encoded text files or other The input stream converts it into the correct string in memory.

For example, there are the following methods in the String class:

public String(byte[] bytes, Charset charset); construct a string by specifying the fixed character set encoding type and the corresponding byte array (byte length is 8 bits);

public byte[] getBytes(Charset charset); specifies the character set encoding type, converting the string into a byte array, that is, the binary representation of the string.

There is another member method that needs to be paid attention to:

public byte[] getBytes(); the character set encoding based on the byte array returned by this method refers to the platform's default character set encoding, not necessarily UTF-16.