Back reference refers to referencing the matched group to other places in the expression itself. For example, when matching HTML tags, we match an <a>, and we need to quote the matched a to find </ a>, this time you need to use backreference.
Syntax
a. To back reference a numbered group, the syntax is number
b. To back reference a named group, the syntax is k<name>
Example
a. Match paired HTML tags
@"<(?<tag>[^ s>]+)[^>]*>.*</k<tag>>"
b. Match two overlapping characters
public static void Main()
{
string s = "aabbc11asd";
Regex reg = new Regex(@"(w)1");
MatchCollection matches = reg.Matches(s);
foreach(Match m in matches)
Console.WriteLine(m.Value);
Console.ReadLine();
}
The return result is aa bb 11
auxiliary matching group
following several group structures, the Pattern in the brackets will not be saved as part of the matching result
1.
The meaning of the positive statement (?=): the pattern in the brackets must appear on the right side of the statement, but not as part of the match
public static void Main()
{
string s = "C#.net,VB.net,PHP,Java,JScript.net";
Regex reg = new Regex(@"[w#]+(?=.net)",RegexOptions.Compiled);
MatchCollection mc = reg.Matches(s);
foreach(Match m in mc)
Console.WriteLine(m.Value);
Console.ReadLine();
//Output C# VB JScript
}
You can see that the matching engine requires matching .net, but does not put .net in the matching results.
2. Negative declaration (?!)
meaning: the pattern in brackets must not appear on the right side of the declaration
. The following example demonstrates how to obtain a < a>All content in the tag pair, even if it contains other HTML tags.
public static void Main()
{
string newsContent = @"url:<a href=""1.html""><img src=""1.gif"">test<span style=""color:red;"">Regex</span> </a>.";
Regex regEnd = new Regex(@"<s*a[^>]*>([^<]|<(?!/a))*<s*/as*>",RegexOptions.Multiline) ;
Console.WriteLine(regEnd.Match(newsContent).Value);
//Result: <a href="1.html"><img src="1.gif">test<span style="color:red;">Regex</span></a>
Console.ReadLine();
}
3.
The meaning of reverse positive declaration (?<=): the pattern in brackets must appear on the left side of the declaration, but not as part of the match
4. The meaning of reverse negative declaration (?<!)
: the pattern in brackets must not
Non-backtracking matching
syntaxappearing on the left side of the statement
: (?>)
Meaning: After this group is matched, the matched characters cannot be used for matching subsequent expressions through backtracking. Haha, I definitely won’t understand it just by reading this sentence. I spent a lot of time trying to understand this, so let’s illustrate it through an example:
" www.csdn.net " can be matched by @"w+.(.*).w+", but cannot be matched by @"w+.(?>.*).w+"! Why?
The reason is that regular matching is greedy. When matching, it will match as many results as possible. Therefore, the .* in the two regular expressions in the above example will completely match csdn.net. At this time, the first expression is in When it starts matching, it finds that .w+ has no characters to match, so it will backtrack. The so-called backtracking is to push back the .* matching result, and the remaining characters from the pushback are used to match .w+ ,Until .w+ matches successfully, the entire expression returns a successful matching result. The second expression uses non-backtracking matching, so after .* is matched, it is not allowed to match .w+ through backtracking, so the entire expression fails to match.
Please note that backtracking is a waste of resources, so please try to avoid having to backtrack to successfully match your regular expression. As in the example above, you can replace it with @"w+.([^.]+ .)+w+"+".