First of all, we need to understand that GB2312, GBK and UTF-8 are all character encodings. In addition, there are many character encodings. It's just that for our Chinese websites, these three encodings are used more often. To put it simply, why do we need to use encoding? In the computer, ASC II code is used to store text information. Each character corresponds to a unique ASCII code. Computers were originally invented in the United States, and they also used keyboards and letters on them, so their characters were easy to solve with ASCII. But our Chinese characters are different. Each Chinese character must correspond to a unique ASCII code. In this way, the national character encoding standards came out: GB2312, GBK, etc. Other countries and other languages also have their corresponding encoding standards. GB means national standard. GB2312 and GBK are mainly used for encoding Chinese characters, while UTF-8 is used worldwide. This means that if your webpage is mainly for Chinese people who speak Chinese, it is very good to use GB2312 and GBK. The text storage volume should be small, which has some advantages. If your webpage is to be open to the world, and you use GB2312 and GBK as webpage encoding, some computer browsers do not have this encoding, and the Chinese character content of your webpage will become unrecognizable garbled characters. They are usually used in the meta tag of a web page, such as:, indicating that this page uses GB2312 encoding. This information is for the browser, which will give priority to decoding the web page using the encoding information extracted from the header of the web page. Of course, we can also force the browser to use a certain encoding to interpret web pages, so that we can see the legendary garbled code.
GBK, GB2312, etc. must be converted to UTF8 through Unicode encoding:
GBK, GB2312--Unicode--UTF8
UTF8--Unicode--GBK, GB2312
For a website or forum, if there are many English characters, it is recommended to use UTF-8 to save space. However, many forum plug-ins now generally only support GBK.
If it is a Chinese website, GB2312 GBK sometimes still has some problems. In order to avoid all garbled characters, UTF-8 should be used. It is also very convenient to support internationalization in the future. UTF-8 can be regarded as a large character set, which contains most of the text. coding.
One benefit of using UTF-8 is that users in other regions (such as Hong Kong and Taiwan) can view your text normally without installing Simplified Chinese support* without garbled characters. *
gb2312 is the code for simplified Chinese
gbk supports simplified Chinese and traditional Chinese
big5 supports Traditional Chinese
utf-8 supports almost all characters
The most commonly used code in mainland China is GBK18030. In addition, there are GBK and GB2312. The relationship between these codes is like this. The earliest Chinese character encoding formulated was GB2312, which included 6763 Chinese characters and 682 other symbols. The encoding was revised in 1995 and named GBK1.0, and a total of 21886 symbols were included. Later, the GBK18030 encoding was launched, which included a total of 27,484 Chinese characters, as well as Tibetan, Mongolian, Uyghur and other major ethnic minority languages. Now the WINDOWS platform must support the GBK18030 encoding.
GB2312 encoding contains approximately more than 6,000 Chinese characters (excluding special characters). The encoding range is b0-f7 for the first digit, and the encoding range for the second digit is a1-fe (when the first digit is cf, the second digit is a1-d3). Calculate the number of Chinese characters to 6762 Chinese characters. Of course there are other characters. Including control keys and other characters, there are about 7573 character codes. The gbk code is an expansion of the GB2312 code and can accommodate more Chinese characters, but it is only an expansion and no qualitative change. All G B2312 codes are retained, and the coding range is expanded on this basis. It accommodates a total of 22014 character codes (including special characters). The gb18030 code is an expansion based on the gbk code. Because there are more Chinese characters, only two digit codes are used. It can no longer accommodate the required Chinese characters, so a 24-bit mixed method is adopted to support more Chinese character encodings. And it retains the original gbk 2-byte encoding and is compatible with GB2312 and gbk encoded files. Approximately accommodates 55657 codes (including special characters) Unicode encoding (that is, UTF encoding): commonly known as Universal Code, it is committed to using unified encoding standards to express the texts of various countries. In order to express more text, UTF-8 uses a 2/3 mixing method. The range of Chinese characters currently accommodated is smaller than gbk encoding. And processing Chinese in 3 bytes has brought about compatibility issues. The original gbk, GB2312, and gb18030 encoded files cannot be processed normally, and there is still a long way to go.
What are the differences between gbk and gb2312
First of all, everyone needs to understand what is gbk? What is gb2312? We need to know that they are all a kind of character encoding. Of course, there are many kinds of character encodings.
The character encoding can be understood like this:
What is stored in the computer are binary values of 0 and 1.
8 bits correspond to a byte, commonly expressed in hexadecimal.
So what if we want to see the characters we want displayed on the computer instead of various numbers 0 and 1?
Here we need to make the computer convert the corresponding hexadecimal values it stores into corresponding characters, including characters in other languages such as English and Chinese, and then output them to the screen.
So encoding means defining a set of rules to specify which values correspond to which characters.
Then character encoding defines a set of rules, specifying which value among so many values stored in the computer corresponds to which letter is displayed on the computer screen.
To sum up, everyone should be able to understand that GBK and GB2312 are character encodings.
Let’s talk about their differences and similarities in detail below:
Similar points:
1. GBK and GB2312 are both 16-bit!
2. They are usually used within the meta tags of web pages.
Differences:
1. GBK character encoding supports Simplified Chinese and Traditional Chinese!
The full name of GBK is "Chinese Internal Code Expansion Specification" (GBK is the first letter of "National Standard" and "Extended" Chinese Pinyin, English name: Chinese Internal Code Specification), National Information Technology Standardization Technical Committee of the People's Republic of China, December 1, 1995 Formulated on December 15, 1995, the Standardization Department of the State Bureau of Technical Supervision and the Science, Technology and Quality Supervision Department of the Ministry of Electronic Industry jointly issued a letter of technical supervision on December 15, 1995 229, defining it as a technical specification guidance document.
2. GB2312 only supports Simplified Chinese!
"Chinese Coded Character Set for Information Exchange" is a set of national standards published by the State Administration of Standards of China in 1980 and implemented on May 1, 1981. The standard number is GB 2312-1980.
The GB 2312 standard contains a total of 6763 Chinese characters, including 3755 first-level Chinese characters and 3008 second-level Chinese characters. At the same time, GB 2312 includes Latin letters, Greek letters, Japanese hiragana and katakana letters, and Russian Cyrillic letters. 682 full-width characters.
If your webpage is mainly for Chinese people who speak Chinese, it is very good to use GB2312 and GBK. The text storage volume should be small, which has some advantages. If your webpage is to be open to the world, and you use GB2312 and GBK as webpage encoding, some computer browsers do not have this encoding, and the Chinese character content of your webpage will become unrecognizable garbled characters.