A regular expression is a formula that uses a certain pattern to match a type of string. A regular expression consists of some ordinary characters and some metacharacters. Ordinary characters include uppercase and lowercase letters and numbers, while metacharacters have special meanings. Whether it is the .Net platform or the Java platform, the meaning expressed by regular expressions is the same. Below we mainly analyze the functions of Java regular expressions. and specific applications. I hope the article will be helpful to you and is for reference only.
Since jdk1.4 launched the java.util.regex package, it has provided us with a good Java regular expression application platform, because Java regular expressions are a very complex system.
// backslash
/t interval('/u0009')
/n line feed ('/u000A')
/r Enter('/u000D')
/d is numerically equivalent to [0-9]
/D non-number is equivalent to [^0-9]
/s whitespace [/t/n/x0B/f/r]
/S non-whitespace character [^/t/n/x0B/f/r]
/w individual characters [a-zA-Z_0-9]
/W non-separate characters [^a-zA-Z_0-9]
/f form feed character
/e Escape
/b a word boundary
/B A non-word boundary
/G end of previous match
^ starts with restriction
^java condition is limited to characters starting with Java
$ is the end of the restriction
The java$ condition is limited to java as the ending character
. The condition restricts any single character except /n
java.. condition is limited to any two characters after java except newline
Add specific restrictions "[]"
[az] condition is limited to one character in the range of lowercase a to z
[AZ] condition is limited to one character in the uppercase A to Z range
[a-zA-Z] condition is limited to one character in the range of lowercase a to z or uppercase A to Z
[0-9] The condition is limited to one character in the range of lowercase 0 to 9
[0-9a-z] The condition is limited to one character in the range of lowercase 0 to 9 or a to z
[0-9[az]] The condition is limited to one character (intersection) in the range of lowercase 0 to 9 or a to z
Add ^ to [] and then add another restriction "[^]"
[^az] condition is limited to one character in the range of non-lowercase a to z
[^AZ] condition is limited to one character in the range of non-capital A to Z
[^a-zA-Z] The condition is limited to one character in the range of non-lowercase a to z or uppercase A to Z
[^0-9] The condition is limited to one character in the range of non-lowercase 0 to 9
[^0-9a-z] The condition is limited to one character in the range of non-lowercase 0 to 9 or a to z
[^0-9[az]] The condition is limited to one character (intersection) in the range of non-lowercase 0 to 9 or a to z
When the restriction condition is that a specific character appears more than 0 times, you can use "*"
J* more than 0J
.* 0 or more characters
J.* More than 0 characters between DJ and D
When the restriction condition is that a specific character appears more than once, you can use "+"
J+ 1 or more J
.+ 1 or more characters
J.+1 or more characters between DJ and D
When the restriction is that a specific character appears 0 or more times, you can use "?"
JA? J or JA appears
Limit to the specified number of consecutive occurrences of the character "{a}"
J{2} JJ
J{3} JJJ
There are more than a characters, and "{a,}"
J{3,} JJJ,JJJJ,JJJJJ,???(J coexists more than 3 times)
More than characters and less than b characters "{a,b}"
J{3,5} JJJ or JJJJ or JJJJJ
Choose one of the two "|"
J|AJ or A
Java|Hello Java or Hello
"()" specifies a combination type <BR>For example, if I query the data between <a href></a> in <a href="index.html/">index</a>, it can be written as <a .*href=/".*/">(.+?)</a>
When using the Pattern.compile function, you can add parameters that control the matching behavior of Java regular expressions:
Pattern Pattern.compile(String regex, int flag)
The value range of flag is as follows:
Pattern.CANON_EQ A match is considered if and only if the "canonical decomposition" of the two characters are exactly the same. For example, after using this flag, the expression "a/u030A" will match "?". By default, "canonical equivalence" is not considered.
Pattern.CASE_INSENSITIVE(?i)
By default, case-insensitive matching only works with the US-ASCII character set. This flag causes expressions to be matched regardless of case. To perform unambiguous matching of Unicode characters, just combine UNICODE_CASE with this flag.
Pattern.COMMENTS(?x)
In this mode, space characters in Java regular expressions will be ignored when matching (Translator's Note: It does not refer to "//s" in the expression, but refers to spaces, tabs, carriage returns, etc. in the expression. ). Comments start with # and continue until the end of the line. Unix line mode can be enabled via an embedded flag.
Pattern.DOTALL(?s)
In this mode, the expression '.' can match any character, including the end of a line. By default, the expression '.' does not match line terminators.
Pattern.MULTILINE(?m)
In this mode, '^' and '$' match the beginning and end of a line respectively. Additionally, '^' still matches the beginning of the string, and '$' also matches the end of the string. By default, these two expressions only match the beginning and end of the string.
Pattern.UNICODE_CASE(?u)
In this mode, if you also enable the CASE_INSENSITIVE flag, it will match Unicode characters case-insensitively. By default, case-insensitive matching only works with the US-ASCII character set.
Pattern.UNIX_LINES(?d)
In this mode, only '/n' is considered a line break, and is matched against '.', '^', and '$'. Putting aside the vague concepts, here are a few simple Java regular use cases:
◆For example, when the string contains validation