Basic grammar of JAVASCRIPT - character set

Author：Eve Cole Update Time：2009-06-11 16:25:36

Basic grammar of JAVASCRIPT - Character set

1: Character set,

speech, no matter how simple or complex, is always composed of symbols. The collection of symbols that constitute a language is the "character set" of this language. English The characters are composed of 26 uppercase letters and 26 lowercase letters plus several punctuation marks. The Chinese character set is much more complicated. Each Chinese character can be regarded as a "character".

In the computer field, characters A set usually refers to the complete set of characters that constitute a text in an encoded manner. Therefore, the character set of JAVASCRIPT refers to the set of legal character ranges that constitute the JAVASCRIPT program [1]

Common standard character sets include ASCLL, ISO, LATIN-1, GBK and UNICODE. Among them, ASCLL is a 7-bit encoding character set, which is basically only applicable to English. The 8-bit ISO-LATIN-1 supports most Latin languages, while the 16-bit encoding GBK and UNICODE fully support Eastern Chinese languages. Language.

In the standards before ECMASCRIPTV3, the JAVASCRIPT instruction itself supported the ASCLL character set, but the UNICODE character set was still allowed to appear in comments or character set string literals enclosed in quotes, and could be correctly processed by parsers that support UMICODE.

One thing to note about character sets is that browsers typically support multiple types of encoders themselves. Therefore, as a program script that falls into a page document, it must not only consider its own encoding, but also fully consider the compatibility of the browser encoder. For browsers that forcibly specify an encoder that does not support UNICODE, JAVASCRIPT will fail to execute because the Chinese in the comments of the script code cannot be parsed correctly.