Java regular expression functions and applications

Author：Eve Cole Update Time：2024-11-17 10:12:01

A regular expression is a formula that uses a certain pattern to match a type of string. A regular expression consists of some ordinary characters and some metacharacters. Ordinary characters include uppercase and lowercase letters and numbers, while metacharacters have special meanings. Whether it is the .Net platform or the Java platform, the meaning expressed by regular expressions is the same. Below we mainly analyze the functions of Java regular expressions. and specific applications. I hope the article will be helpful to you and is for reference only.
Since jdk1.4 launched the java.util.regex package, it has provided us with a good Java regular expression application platform, because Java regular expressions are a very complex system.
// backslash
/t interval('/u0009')
/n line feed ('/u000A')
/r Enter('/u000D')
/d is numerically equivalent to [0-9]
/D non-number is equivalent to [^0-9]
/s whitespace [/t/n/x0B/f/r]
/S non-whitespace character [^/t/n/x0B/f/r]
/w individual characters [a-zA-Z_0-9]
/W non-separate characters [^a-zA-Z_0-9]
/f form feed character
/e Escape
/b a word boundary
/B A non-word boundary
/G end of previous match
^ starts with restriction
^java condition is limited to characters starting with Java
$ is the end of the restriction
The java$ condition is limited to java as the ending character
. The condition restricts any single character except /n
java.. condition is limited to any two characters after java except newline
Add specific restrictions "[]"
[az] condition is limited to one character in the range of lowercase a to z
[AZ] condition is limited to one character in the uppercase A to Z range
[a-zA-Z] condition is limited to one character in the range of lowercase a to z or uppercase A to Z
[0-9] The condition is limited to one character in the range of lowercase 0 to 9
[0-9a-z] The condition is limited to one character in the range of lowercase 0 to 9 or a to z
[0-9[az]] The condition is limited to one character (intersection) in the range of lowercase 0 to 9 or a to z
Add ^ to [] and then add another restriction "[^]"
[^az] condition is limited to one character in the range of non-lowercase a to z
[^AZ] condition is limited to one character in the range of non-capital A to Z
[^a-zA-Z] The condition is limited to one character in the range of non-lowercase a to z or uppercase A to Z
[^0-9] The condition is limited to one character in the range of non-lowercase 0 to 9
[^0-9a-z] The condition is limited to one character in the range of non-lowercase 0 to 9 or a to z
[^0-9[az]] The condition is limited to one character (intersection) in the range of non-lowercase 0 to 9 or a to z
When the restriction condition is that a specific character appears more than 0 times, you can use "*"
J* more than 0J
.* 0 or more characters
J.* More than 0 characters between DJ and D
When the restriction condition is that a specific character appears more than once, you can use "+"
J+ 1 or more J
.+ 1 or more characters
J.+1 or more characters between DJ and D
When the restriction is that a specific character appears 0 or more times, you can use "?"
JA? J or JA appears
Limit to the specified number of consecutive occurrences of the character "{a}"
J{2} JJ
J{3} JJJ
There are more than a characters, and "{a,}"
J{3,} JJJ,JJJJ,JJJJJ,???(J coexists more than 3 times)
More than characters and less than b characters "{a,b}"
J{3,5} JJJ or JJJJ or JJJJJ
Choose one of the two "|"
J|AJ or A
Java|Hello Java or Hello
"()" specifies a combination type <BR>For example, if I query the data between <a href></a> in <a href="index.html/">index</a>, it can be written as <a .*href=/".*/">(.+?)</a>
When using the Pattern.compile function, you can add parameters that control the matching behavior of Java regular expressions:
Pattern Pattern.compile(String regex, int flag)
The value range of flag is as follows:
Pattern.CANON_EQ A match is considered if and only if the "canonical decomposition" of the two characters are exactly the same. For example, after using this flag, the expression "a/u030A" will match "?". By default, "canonical equivalence" is not considered.
Pattern.CASE_INSENSITIVE(?i)
By default, case-insensitive matching only works with the US-ASCII character set. This flag causes expressions to be matched regardless of case. To perform unambiguous matching of Unicode characters, just combine UNICODE_CASE with this flag.
Pattern.COMMENTS(?x)
In this mode, space characters in Java regular expressions will be ignored when matching (Translator's Note: It does not refer to "//s" in the expression, but refers to spaces, tabs, carriage returns, etc. in the expression. ). Comments start with # and continue until the end of the line. Unix line mode can be enabled via an embedded flag.
Pattern.DOTALL(?s)
In this mode, the expression '.' can match any character, including the end of a line. By default, the expression '.' does not match line terminators.
Pattern.MULTILINE(?m)
In this mode, '^' and '$' match the beginning and end of a line respectively. Additionally, '^' still matches the beginning of the string, and '$' also matches the end of the string. By default, these two expressions only match the beginning and end of the string.
Pattern.UNICODE_CASE(?u)
In this mode, if you also enable the CASE_INSENSITIVE flag, it will match Unicode characters case-insensitively. By default, case-insensitive matching only works with the US-ASCII character set.
Pattern.UNIX_LINES(?d)
In this mode, only '/n' is considered a line break, and is matched against '.', '^', and '$'. Putting aside the vague concepts, here are a few simple Java regular use cases:
◆For example, when the string contains validation

Copy the code code as follows:

//Find a string that starts with Java and ends with anything
Pattern pattern = Pattern.compile("^Java.*");
Matcher matcher = pattern.matcher("Java is not a human");
boolean b= matcher.matches(); //When the condition is met, true will be returned, otherwise false will be returned
System.out.println(b);

When splitting a string with multiple conditions

Copy the code code as follows:

Pattern pattern = Pattern.compile("[, |]+");
String[] strs = pattern.split("Java Hello World Java,Hello,,World|Sun");
for (int i=0;i<strs.length;i++) {
System.out.println(strs[i]);
}

Text replacement (first occurrence of character)

Copy the code code as follows:

Pattern pattern = Pattern.compile("Java regular expression");
Matcher matcher = pattern.matcher("Java regular expression Hello World, regular expression Hello World");
//Replace the first data that matches the regular pattern
System.out.println(matcher.replaceFirst("Java"));

Text replacement (all)

Copy the code code as follows:

Literal replacement (replacing characters)

Copy the code code as follows:

Pattern pattern = Pattern.compile("Java regular expression");
Matcher matcher = pattern.matcher("Java regular expression Hello World, regular expression Hello World ");
StringBuffer sbr = new StringBuffer();
while (matcher.find()) {
matcher.appendReplacement(sbr, "Java");
}
matcher.appendTail(sbr);
System.out.println(sbr.toString());

Verify if it is an email address

Copy the code code as follows:

String str="[email protected]";
Pattern pattern = Pattern.compile("[//w//.//-]+@([//w//-]+//.)+[//w//-]+",Pattern.CASE_INSENSITIVE );
Matcher matcher = pattern.matcher(str);
System.out.println(matcher.matches());

Remove html tags

Copy the code code as follows:

Pattern pattern = Pattern.compile("<.+?>", Pattern.DOTALL);
Matcher matcher = pattern.matcher("<a href=/"index.html/">Homepage</a>");
String string = matcher.replaceAll("");
System.out.println(string);

Find the corresponding condition string in html

Copy the code code as follows:

Pattern pattern = Pattern.compile("href=/"(.+?)/"");
Matcher matcher = pattern.matcher("<a href=/"index.html/">Homepage</a>");
if(matcher.find())
System.out.println(matcher.group(1));
}

◆Intercept http://address <BR>code

Copy the code code as follows:

//Intercept url
Pattern pattern = Pattern.compile("(http://|https://){1}[//w//.//-/:]+");
Matcher matcher = pattern.matcher("dsdsds<http://dsds//gfgffdfd>fdf");
StringBuffer buffer = new StringBuffer();
while(matcher.find()){
buffer.append(matcher.group());
buffer.append("/r/n");
System.out.println(buffer.toString());
}

◆Replace the specified {} Chinese character <BR> code

Copy the code code as follows:

String str = "The current development history of Java is from {0} years to {1} years";
String[][] object={new String[]{"//{0//}","1995"},new String[]{"//{1//}","2007"}};
System.out.println(replace(str,object));
public static String replace(final String sourceString,Object[] object) {
String temp=sourceString;
for(int i=0;i<object.length;i++){
String[] result=(String[])object[i];
Pattern pattern = Pattern.compile(result[0]);
Matcher matcher = pattern.matcher(temp);
temp=matcher.replaceAll(result[1]);
}
return temp;
}

◆Query the file<BR>code in the specified directory using regular conditions

Copy the code code as follows:

//Used to cache file list
private ArrayList files = new ArrayList();
//Used to host file path
private String _path;
//Used to carry unmerged regular formulas
private String _regexp;
class MyFileFilter implements FileFilter {
/**
* Match file name
*/
public boolean accept(File file) {
try {
Pattern pattern = Pattern.compile(_regexp);
Matcher match = pattern.matcher(file.getName());
return match.matches();
} catch (Exception e) {
return true;
}
}
}
/**
* Parse the input stream
* @param inputs
*/
FilesAnalyze (String path,String regexp){
getFileName(path,regexp);
}
/**
* Analyze file names and add files
* @param input
*/
private void getFileName(String path,String regexp) {
//Table of contents
_path=path;
_regexp=regexp;
File directory = new File(_path);
File[] filesFile = directory.listFiles(new MyFileFilter());
if (filesFile == null) return;
for (int j = 0; j < filesFile.length; j++) {
files.add(filesFile[j]);
}
return;
}
/**
* Display output information
* @param out
*/
public void print (PrintStream out) {
Iterator elements = files.iterator();
while (elements.hasNext()) {
File file=(File) elements.next();
out.println(file.getPath());
}
}
public static void output(String path,String regexp) {
FilesAnalyze fileGroup1 = new FilesAnalyze(path,regexp);
fileGroup1.print(System.out);
}
public static void main (String[] args) {
output("C://","[Az|.]*");
}

Java regular expressions have many functions. In fact, as long as it is character processing, there is nothing that regular expressions cannot do.