Regular expression (regular expression) describes a string matching pattern, which can be used to check whether a string contains a certain substring, replace the matching substring, or extract a substring that meets a certain condition from a certain string. wait.
The regular expression function of the Perl language is very powerful, basically the most powerful among commonly used languages. Many languages refer to Perl's regular expressions when designing regular expression support.
The three forms of Perl regular expressions are matching, replacement and transformation:
Match: m/
Replacement: s/
Conversion: tr/
These three forms are generally used with =~ or !~ , =~ means matching, !~ means not matching.
The matching operator m// is used to match a string statement or a regular expression. For example, to match "run" in the scalar $bar, the code is as follows:
Executing the above program, the output result is:
First match Second match
Pattern matching has some commonly used modifiers, as shown in the following table:
modifier | describe |
---|---|
i | Ignore case in pattern |
m | multiline mode |
o | Assign value only once |
s | Single line mode, "." matches "n" (default does not match) |
x | Ignore whitespace in pattern |
g | global match |
cg | After global matching fails, the matching string is allowed to be searched again. |
After perl is processed, there will be three special variable names for the matched values:
$`: The previous part of the matching string
$&: matching string
$': There are no remaining strings matched yet
If you put these three variables together, you will get the original string.
Examples are as follows:
The output result of executing the above program is:
String before matching: welcome to String before matching: run String after matching: oob site.
The replacement operator s/// is an extension of the matching operator and replaces the specified string with a new string. The basic format is as follows:
s/PATTERN/REPLACEMENT/;
PATTERN is the matching pattern, REPLACEMENT is the replacement string.
For example, we replace "google" in the following string with "codercto":
The output result of executing the above program is:
welcome to codercto site.
The replacement operation modifiers are shown in the following table:
modifier | describe |
---|---|
i | If "i" is added to the modifier, the regular expression will cancel the case sensitivity, that is, "a" and "A" are the same. |
m | The default regular start "^" and end "$" are only for regular strings. If "m" is added to the modifier, then the start and end will refer to each line of the string: the beginning of each line is "^", It ends with "$". |
o | The expression is executed only once. |
s | If "s" is added to the modifier, the default "." representing any character other than newline characters will become any character, including newline characters! |
x | If this modifier is added, whitespace characters in the expression will be ignored unless they have been escaped. |
g | Replace all matching strings. |
e | Replace string as expression |
The following are the modifiers related to the conversion operator:
modifier | describe |
---|---|
c | Convert all unspecified characters |
d | Delete all specified characters |
s | Condensate multiple identical output characters into one |
The following example converts all lowercase letters in the variable $string to uppercase letters:
#!/usr/bin/perl $string = 'welcome to codercto site.';$string =~ tr/az/Az/;print "$stringn";
The output result of executing the above program is:
WELCOME TO CODERCTO SITE.
The following example uses /s to remove repeated characters from the variable $string:
The output result of executing the above program is:
runob
More examples:
$string =~ tr/d/ /c; # Replace all non-numeric characters with spaces $string =~ tr/t //d; # Delete tabs and spaces $string =~ tr/0-9/ /cs # Replace other characters between numbers with a space.
expression | describe |
---|---|
. | Matches all characters except newline characters |
x? | Match x string 0 or once |
x* | Match x string 0 or more times, but match the minimum number of times possible |
x+ | Match x string 1 or more times, but match the minimum number of times possible |
.* | Matches any character 0 or more times |
.+ | Match any character 1 or more times |
{m} | Matches exactly m specified strings |
{m,n} | Matches more than m and less than n specified strings |
{m,} | Match m or more specified strings |
[] | Matches characters within [] |
[^] | Matches characters that do not match [] |
[0-9] | Matches all numeric characters |
[az] | Matches all lowercase alphabetic characters |
[^0-9] | Matches all non-numeric characters |
[^az] | Matches all non-lowercase alphabetic characters |
^ | Matches characters starting with |
$ | Matches the character at the end of the character |
d | Matches a numeric character, the same syntax as [0-9] |
d+ | Matches multiple numeric strings, the same syntax as [0-9]+ |
D | Non-number, the same as othersd |
D+ | Non-number, the same as d+ for others |
w | A string of English letters or numbers, the same syntax as [a-zA-Z0-9_] |
w+ | Same syntax as [a-zA-Z0-9_]+ |
W | A string of non-English letters or numbers, the same syntax as [^a-zA-Z0-9_] |
W+ | Same syntax as [^a-zA-Z0-9_]+ |
s | Space, same syntax as [ntrf] |
s+ | Same as [ntrf]+ |
S | Non-space, same syntax as [^ntrf] |
S+ | Same syntax as [^ntrf]+ |
b | Matches strings bounded by English letters and numbers |
B | Matches strings that are not bounded by English letters or numerical values |
a|b|c | Matches strings that match a character, b character, or c character |
abc | Match the string (pattern) containing abc () This symbol will remember the found string, which is a very practical syntax. The string found in the first () becomes the $1 variable or 1 variable, the string found in the second () becomes the $2 variable or the 2 variable, and so on. |
/pattern/i | i This parameter indicates that English case is ignored, that is, when matching strings, the English case problem is not considered. If you want to find a special character in the pattern mode, such as "*", you need to add it before this character Use the symbol to invalidate special characters. |