Javascript regular expression basics

Author：Eve Cole Update Time：2009-06-11 16:22:10

Semantics and usage of RegExp objects:

Check string matches, obtain part of the content in the string, and construct a new string based on the original string (including addition, deletion, and modification)
There are two main ways to construct a RegExp object:

use literal, such as /w/g
Use a constructor, such as new RegExp(/w/)
There are several points to note when constructing RegExp objects:

literal is often used to construct static RegExp objects that are not generated by runtime.
Flags can be added directly after literal to limit the execution results of matching strings. Commonly used flags include g and i, which are used to represent global matching (global) and case-insensitive matching (case-insensitive) respectively.
The first parameter of the RegExp constructor provides the pattern of the object. If pattern is a RegExp literal, the second parameter (that is, specifying the flag) cannot be provided. If pattern is a string, you can use the second parameter
The RegExp constructor is often used to build dynamic RegExp objects generated by runtime.
When the pattern of RegExp is string, all "" in literal must be written as "\", because "" needs to be escaped in string. The above two methods build a javascript object, so /w/ == /w/Return false

RegExp object-related functions:

To check the matching of a string, you can use the regExp.test(string) method, which checks whether the string matches the pattern provided by regExp. You can also use the string.search(regExp) method, which will return -1 if the two do not match.
To get part of the string, you can use the regExp.exec(string) method, or you can use the string.match(regExp) method. To build a new string on the original string, usually use string.replace(searchValue, replaceValue) method
The semantics of pattern are provided in the RegExp object:

RegExp can define multiple Alternatives separated by "|". This operation has the lowest priority, so if "|" exists, it first divides the RegExp into several parts. Each Alternative consists of multiple parts. Composed of terms, terms are divided into assertion (used for positional qualification), atom (matching unit), and atom with quantifier (modifier)
The assertion is divided into "^" which is used to match the beginning of the string. In multiInput (that is, when the flag contains m), it can also match the beginning of a line. "$" is used to match the end of the string. In multiInput (that is, when the flag contains m), It can also match the end of a line. "b" is used to match a w and W interval. Note that it does not match any content in the string, but is only used to determine the matching of a position.
The situation of atom is more complicated. Let's first look at the Quantifier of atom, which can include * + ? {n} {m,} {m,n}, where * means that atom can have 0 or more repetitions, + means 1 or more For the above repetitions, ? means 0 or 1 times, {n} means exactly n repetitions, {m,} means m or more repetitions, {m,n} means the number of repetitions is between m and n (including m and n ), the above Quantifier can also be followed by a ? to start the non-greedy mode. I will explain the meaning of this mode later.
Atom contains patternCharacter (ordinary characters, that is, characters with no special semantics in pattern, which will be matched literally)
"." matches all non-line-terminator characters
AtomEscape contains the numeric 1 (used to refer to the successful match in the preceding parentheses). References to some characters n f r t v xNN uXXXX cX, etc. Some characters with special meanings, such as d D s S w W
CharacterClass contains two forms: [...] and [^...]. It can contain many characters, such as -, n and other character references, b, d and other special meaning characters. Note that if there are characters around "-", the semantics change from a certain character to a certain character. If there are no characters to the left or right of "-", then "-" only represents the dash character.
(group), if you perform grouping in the above mode, you can use Quantifier to modify it after the group.
(?:group), only used as a group, and the content matched by the group will not be recorded and will not be included in 1..n
(?=group), requires matching but does not include the group matching content in the returned matching string
(?!group), requires no matching and will not include the group matching content in the returned matching string. Now I want to talk about the two more important operating modes of RegExp:

Matching Alternative is always from left to right. If encounter The first match will no longer try subsequent matches, such as

/ab|abc/.exec("abc"). The above string "abc" only matches ab in pattern, and will not match abc.

When matching, it is always normal. If ? is not added after Quantifier, the greedy mode will be used. After adding ?, the non-greedy mode will be used. For example, when

matching /w+bc/.exec("abcbcbc"), w+ will always match as many matches as possible first (i.e. greedy mode ), so it will match "abcbcbc". If it is changed to

/w+?bc/.exec("abcbcbc"), then it will match as few as possible (that is, non-greedy mode), so it will only match "abc".

For general string parsing requirements, you can generally use the exec or match method to parse. If the string is large, you often need to use a loop structure for parsing. RegExp is very powerful when combined with while and other statements.

When you want to convert an existing string into another string by modifying it, you generally always use the replace method. This method is the most important method in RegExp in my opinion. Its polymorphism makes it have many forms. It can meet almost all the needs of modifying strings.