PHP programming skills: learn regular expressions through examples

Author：Eve Cole Update Time：2009-06-01 18:21:13

First, let's look at two special characters: '^' and '$'. They are used to match the beginning and end of the string respectively. Here are examples:

"^The": matches characters starting with "The" String;

"of despair$": matches the string ending with "of despair";

"^abc$": matches the string starting with abc and ending with abc, in fact, only abc matches it;

"notice": Matches strings containing notice;

you can see that if you don't use the two characters we mentioned (the last example), that means the pattern (regular expression) can appear anywhere in the string being checked, and you don't use it Lock to the sides.

There are also several characters '*', '+', and '?', which are used to represent the number or order of occurrences of a character. They respectively represent: "zero or more", "one or more", and " zero or one." Here are some examples:

"ab*": Matches a string consisting of a and zero or more b ("a", "ab", "abbb", etc.);

"ab+" : Same as above, but with at least one b ("ab", "abbb", etc.);

"ab?": matches 0 or one b;

"a?b+$": matches one or 0 a followed by Add more than one string ending with b.

You can also limit the number of characters appearing in curly brackets, such as

"ab{2}": matches an a followed by two b (no less) ("abb ");

"ab{2,}": at least two b("abb", "abbbb", etc.);

"ab{3,5}": 2-5 b("abbb", "abbbb ", or "abbbbb").

You must also note that you must always specify (ie, "{0,2}", not "{,2}"). Likewise, you must note that, '*', ' +', and '?' are the same as the following three range annotations, "{0,}", "{1,}", and "{0,1}" respectively.

Now put a certain number of characters into parentheses, for example:

"a(bc)*": matches a followed by 0 or one "bc";

"a(bc){1,5}": one to 5 "bc."

also has a character '│', which is equivalent to OR operation:

"hi│hello": matches a string containing "hi" or "hello";

"(b│cd)ef": matches a string containing "bef" Or the string of "cdef";

"(a│b)*c": Matches a string containing - multiple (including 0) a or b, followed by a c string;

a dot ('.' ) can represent all single characters:

"a.[0-9]": an a followed by a character followed by a number (strings containing such a string will be matched, and this bracket will be omitted in the future)

"^.{ 3}$": ends with three characters. The content enclosed in square brackets only matches a single character.

"[ab]": matches a single a or b (same as "a│b");

"[ad]" : Matches a single character from 'a' to 'd' (same effect as "a│b│c│d" and "[abcd]");

"^[a-zA-Z]": Matches characters starting with a letter String

"[0-9]%": Matches strings containing x%

, "[a-zA-Z0-9]$": Matches strings ending with a comma followed by a number or letter.

You can also Put the characters you don't want in square brackets. You just need to use '^' as the beginning of the bracket (ie, "%[^a-zA-Z]%" matches two percent signs with one inside. non-letter string).

In order to be able to interpret, but when "^.[$()│*+?{" is used as a character with special meaning, you must add '' in front of these characters, and in php3 You should avoid using at the beginning of the pattern. For example, the regular expression "($│?[0-9]+" should be called ereg("( \$│?[0-9 ]+", $str) (I don’t know if it is the same in php4)

Don’t forget that characters inside square brackets are an exception to this rule - inside square brackets, all special characters, including (''), will lose their special properties ( ie, "[*+?{}.]" matches strings containing these characters). Also, as the regx manual tells us: "If the list contains ']', it is best to use it as the first character in the list A character (may follow '^'). If it contains '-', it is best to put it at the beginning or end, or or at the second end point of a range (ie [ad-0-9] The '-' in the middle will be valid.

For the sake of completeness, I should cover collating sequences, character classes, and equivalence classes. But I don't want to go into too much detail in these aspects, and these will not need to be covered in the following article. You More information can be found in the regex man pages.

How to build a pattern to match input of a currency amount

. Now we are going to use what we have learned to do something useful: build a matching pattern to check whether the input information is A number representing money. We think there are four ways to represent the amount of money: "10000.00" and "10,000.00", or without a decimal part, "10000" and "10,000". Now let's start building this matching pattern:

^[1-9][0 -9]*$

This means that all variables must start with a number other than 0. But this also means that a single "0" cannot pass the test. The following is the solution:

^(0│[1-9][0- 9]*)$

"Only 0 and numbers starting with 0 match", we can also allow a negative sign before the number:

^(0│-?[1-9][0-9]*)$

This is: "0 or a number starting with 0 that may have a negative sign in front of it." Okay, okay now let's be less strict and allow starting with 0. Now let's drop the negative sign, because we're representing coins There is no need to use it. We now specify the pattern to match the decimal part:

^[0-9]+(.[0-9]+)?$

This implies that the matching string must start with at least one Arabic digit . But note that "10." does not match in the above pattern, only "10" and "10.2" can. (Do you know why)

^[0-9]+(.[0-9]{ 2})?$

We specified above that there must be two decimal places after the decimal point. If you think this is too harsh, you can change it to:

^[0-9]+(.[0-9]{1,2})? $

This will allow one or two characters after the decimal point. Now that we add commas for readability (every third digit), we can represent it like this:

^[0-9]{1,3}(,[ 0-9]{3})*(.[0-9]{1,2})?$

Don't forget the plus sign '+' which can be replaced by the multiplication sign '*' if you want to allow blank strings to be entered (Why?). Also don't forget that the backslash '' can cause errors in PHP strings (a very common error). Now that we can confirm the string, we will now remove all commas str_replace(" ,", "", $money) Then treat the type as double and we can do mathematical calculations through it.

Constructing a regular expression for checking email

Let us continue to discuss how to verify an email address. In a complete email address There are three parts in: POP3 username (everything to the left of '@' ), '@' , server name (that's the remaining part). The username can contain uppercase and lowercase letters, Arabic numerals, periods ('.'), minus ('-'), and underscore ('_'). Server names also follow this rule, except for the underscore.

Now, usernames cannot start or end with periods. The same goes for servers. And you can't have two. There must be at least one character between consecutive periods. Now let's take a look at how to write a matching pattern for the user name:

^[_a-zA-Z0-9-]+$

The existence of periods is not allowed yet. Let's put it Add:

^[_a-zA-Z0-9-]+(.[_a-zA-Z0-9-]+)*$

The above means: "Start with at least one canonical character (except . accidental) , followed by 0 or more strings starting with a dot. "

To simplify it, we can use eregi() instead of ereg(). eregi() is case-insensitive, so we don't need to specify two ranges" az " and "AZ" - you only need to specify one:

^[_a-z0-9-]+(.[_a-z0-9-]+)*$

The server name after it is the same, but the underscore must be removed :

^[a-z0-9-]+(.[a-z0-9-]+)*$

Done. Now just use "@" to connect the two parts:

^[_a-z0-9-]+ (.[_a-z0-9-]+)*@[a-z0-9-]+(.[a-z0-9-]+)*$

This is the complete email authentication matching mode, only Need to call

eregi('^[_a-z0-9-]+(.[_a-z0-9-]+)*@[a-z0-9-]+(.[a-z0-9-] +)*$ ',$eamil)

to get whether it is email.

Other uses of regular expressions

to extract strings

ereg() and eregi() have a feature that allows users to extract part of a string through regular expressions (you can read the manual for specific usage). For example, we want to extract part of a string from path/URL Extract filenames – the following code is what you need:

ereg("([^\/]*)$", $pathOrUrl, $regs);
echo $regs[1];

Advanced substitutions

ereg_replace() and eregi_replace() are also very useful: If we want to replace all separated negative signs with commas:

ereg_replace("[ nrt]+" , ",", trim($str));