If you have the choice, you should still use UTF-8
In fact, the Windows system's own programs have fully shifted to Unicode, and GBK is just a stopgap measure to cope with Chinese standards.
The text encoding of GBK is represented by double bytes, that is, both Chinese and English characters are represented by double bytes. However, in order to distinguish Chinese, the highest bit is set to 1.
As for UTF-8 encoding, it is a multi-byte encoding used to solve international characters. It uses 8 bits (that is, one byte) for English and 24 bits (three bytes) for Chinese. For forums with many English characters, UTF-8 is used to save space.
GBK contains all Chinese characters,
UTF-8 contains characters needed by all countries in the world.
GBK is a standard that is compatible with GB2312 after being expanded based on the national standard GB2312 (it seems that it is not a national standard yet)
UTF-8 encoded text can be displayed on various browsers in various countries that support the UTF8 character set.
For example, if it is UTF8 encoding, Chinese can also be displayed on foreigners' English IE without them needing to download IE's Chinese language support package.
Therefore, for forums with a lot of English, each character takes up 2 bytes when using GBK, but only takes up one byte when using UTF-8 English.
Please note: Although the UTF-8 version has good international compatibility, the Chinese version requires 50% more database storage space than the GBK/BIG5 version, so it is not recommended and can only be used by users with special requirements for international compatibility.
Simply put:
For forums with many Chinese characters, it is appropriate to use GBK encoding to save database space.
For forums with a lot of English, it is appropriate to use UTF-8 to save database space.
What are the differences between gbk and gb2312
First of all, everyone needs to understand what is gbk? What is gb2312? We need to know that they are all a kind of character encoding. Of course, there are many kinds of character encodings.
The character encoding can be understood like this:
What is stored in the computer are binary values of 0 and 1.
8 bits correspond to a byte, commonly expressed in hexadecimal.
So what if we want to see the characters we want displayed on the computer instead of various numbers 0 and 1?
Here we need to make the computer convert the corresponding hexadecimal values it stores into corresponding characters, including characters in other languages such as English and Chinese, and then output them to the screen.
So encoding means defining a set of rules to specify which values correspond to which characters.
Then character encoding defines a set of rules that specifies which value among so many values stored in the computer corresponds to which letter is displayed on the computer screen.
To sum up, everyone should be able to understand that GBK and GB2312 are character encodings.
Let’s talk about their differences and similarities in detail below:
Similar points:
1. GBK and GB2312 are both 16-bit!
2. They are usually used within the meta tags of web pages.
Differences:
1. GBK character encoding supports Simplified Chinese and Traditional Chinese!
The full name of GBK is "Chinese Internal Code Expansion Specification" (GBK is the first letter of "National Standard" and "Extended" Chinese Pinyin, English name: Chinese Internal Code Specification), National Information Technology Standardization Technical Committee of the People's Republic of China, December 1, 1995 Formulated on December 15, 1995, the Standardization Department of the State Bureau of Technical Supervision and the Science and Technology and Quality Supervision Department of the Ministry of Electronic Industry jointly issued a letter of technical supervision on December 15, 1995. 229, defining it as a technical specification guidance document.
2. GB2312 only supports Simplified Chinese!
"Chinese Coded Character Set for Information Exchange" is a set of national standards published by the State Administration of Standards of China in 1980 and implemented on May 1, 1981. The standard number is GB 2312-1980.
The GB 2312 standard contains a total of 6763 Chinese characters, including 3755 first-level Chinese characters and 3008 second-level Chinese characters. At the same time, GB 2312 includes Latin letters, Greek letters, Japanese hiragana and katakana letters, and Russian Cyrillic letters. 682 full-width characters.
If your webpage is mainly for Chinese people who speak Chinese, it is very good to use GB2312 and GBK. The text storage volume should be small, which has some advantages. If your webpage is to be open to the world, and you use GB2312 and GBK as webpage encoding, some computer browsers do not have this encoding, and the Chinese character content of your webpage will become unrecognizable garbled characters.