URL rewriting can delete the date pattern---powerful application of regular expressions

Author：Eve Cole Update Time：2009-07-01 16:25:38

Recently, I have been busy with my own blog program, and naturally I turned to the URL rewriting issue that is often talked about in blogs. One reason is a face issue in a multi-user blog system. I originally wanted to use my CSDN blog http://blog.csdn.net/joshualang as an example, but I thought about it and used my space ( http://www. tyoo.net ), because this is where I will go after I finish my blog.

If the necessary URL rewriting is not performed like http://www.tyoo.net/blog/joshualang , a blog address like http://www.tyoo.net/blog/default.aspx?Bloger=joshualang will appear; Everyone should know that a blog is regarded as another home built on the Internet by most netizens. In real life, they will consider the distance of the home address and the quality of transportation. Similarly, the home on the Internet must also have a house number that is easy to remember. This house number is not too long (not very decent), and adding a bunch of parameters after it may make people intimidated. If you want to read the article, you have to face such a lothttp:// blog.tyoo.net/Articles/Default.aspx?Bloger=joshualang&ArticleID=20070118234530Do you feel something? Let’s take a look at an effect that many blog programs now have: http://blog.tyoo.net/joshua/Articles/2007/01/18/ Everyone will know the benefits of this effect at a glance, and this is This brings us to the key point of this article!

Yes, we need to achieve one of our goals through such a very regular string.

A viable URL should be selected following the following criteria:

• Short.

• Easy to type.

• The structure of the site can be seen.

• "Truncatable," which allows users to browse the site by removing components of the URL.

I don’t need to say more about this point. In fact, it is all about simplicity and practicality.

Note: Speaking of which, it is necessary to read Scott Mitchell's document on the MSDN website http://www.microsoft.com/china/msdn/library/webservices/asp.net/URLRewriting.mspx?pf=true#top

already Some experts have made the principles of URL rewriting clear enough here. If you don’t understand, you can download the source code of the above document to study it.

For the sake of efficiency (people's time is precious, and programmer's time is even more so), I directly used the components of URLRewriter.net. In fact, I have never had any experience in URL rewriting before, so I roughly read the source program a few times and started to get started. . Of course, there is no need for any technical content at the beginning. As long as the rewrite is successful, it will feel good. Then, in the continuous rewriting process, we will find out the problems and find new ideas and new discoveries. . . Hence this article.

Let's get started now. This time, the focus is actually on the URL rewriting of the date pattern just mentioned.

http://blog.tyoo.net/Articles/2007/01/18/233030/joshualang.aspx This is the final effect to be achieved here. .

Basics: UrlRewriter.net component (of course you can write it yourself), understand the regular expression

parameters: ArticleID //Article number [Type: string Length: 14 (like: yyyymmddhhmmss) // Meaningful and not repeated]

Bloger //Blog Master username [Type: string starting with a letter]

One problem encountered during the URL rewriting process is a 404 error when accessing a non-existent directory or file. The suggestion provided by this document on MSDN is to create necessary folders and empty pages in the program directory. It is troublesome indeed, thousands of directories need to be created.

Since we can't just access non-existent directory files, then we just don't need to access such directories. We need to access our existing files (all my URLs point to a Default.aspx page under the root directory of the Blog and then dynamically load the control group to generate different view function pages). Of course, this time we want to point to this page ~/Default .aspx;

The following task is to pass parameters, of course the URL is passed by value. That's why it's time to rewrite it.

The protagonist is about to appear again: regular expressions.

The use of regular expressions here is really cool~ http://blog.tyoo.net/joshualang/Articles/2007/01/18/Default.aspx directory you may generate URL rewriting rules like the following:

< RewriterRule>
<LookFor>~/(w{6,16})/Articles/(d{4})/(d{2})/(d{2})/Default.aspx</LookFor>
<SendTo>~/Default.aspx?Bloger=$1&year=$2&month=$3&day=$4</SendTo>
</RewriterRule>

When deleting the rules written in this way, a 404 error will definitely occur because the page being accessed does not exist. Because it will search all the way down along your directory mark, if you create another directory project, it will be too big. Take a look at the following code:

<RewriterRule>
<LookFor>~/(w{6,16})/Articles/(d{4})/(d{2})/(d{2})/(d{6})/Default.aspx</LookFor >
<SendTo>~/Default.aspx?Bloger=$1&year=$2&month=$3&day=$4&time=$5</SendTo>
</RewriterRule>

Now that I have extra time, no one will choose to create a directory. Then make full use of existing files to complete this task.

In fact, many people may think of using file names instead of directory structures. Come to think of it, isn't it? Of course, this requires a certain understanding of regular expressions.

Okay, let’s see how it works.

<RewriterRule>
<LookFor>~/(w{5,16})/Articles/(d{4})/(d{2})/(d{2})/(d{6}).aspx</ LookFor>
<SendTo>~/Default.aspx?ArticleID=$2$3$4$5&Bloger=$1</SendTo>
</RewriterRule>

You can easily see that I used "" to escape "/" and used it in the file name. The current structure is

http://blog.tyoo.net/joshualang/2007/01/18. aspx

Obviously, my ArticleID is a string based on year, month, day, hour, minute and second, because it makes sense to insert data without considering duplication, and using time here is also convenient for querying. The 14-digit combination obtained by $2$3$4$5 is my ArticleID. The record can be easily found through the publication date and article number. The last benefit is especially obvious when deleting.

Now let's delete the time part:

<RewriterRule>
<LookFor>~/(w{5,16})/Articles/(d{4})/(d{2})/(d{2}).aspx</LookFor>
<SendTo>~/Default.aspx?ArticleID=$2$3$4&Bloger=$1</SendTo>
</RewriterRule>

In turn, we can delete the URL into this pattern: http://blog.tyoo.net/joshualang/Articles/2007.aspx or even http://blog.tyoo.net/joshualang/Articles/Default.aspx.

Isn’t it very simple? . But you have to pay attention to a problem: anything is not possible and can be achieved. There are many more things to consider, such as:

http://blog.tyoo.net/joshualang/Articles/2007.aspx and http://blog. What is the difference between tyoo.net/joshualang/Articles/2007/.aspx ? Can the latter run normally after following the above rules?

The same

cannot

be said: http://blog.tyoo.net/joshualang/Articles/2007/01/08/.aspx is also not acceptable. Other response rules need to be defined to achieve appropriate rewriting effects.

Okay, the effect is roughly there; here is the complete rule code:

<RewriterRule>
<LookFor>~/([A-Za-z]w{5,16})/Default.aspx</LookFor>
<SendTo>~/Default.aspx?Bloger=$1</SendTo>
</RewriterRule>

<RewriterRule>
<LookFor>~/Articles/(d{4})/(d{2})/(d{2})/(d{1,6})/([A-Za-z]w {5,16}).aspx</LookFor>
<SendTo>~/Default.aspx?ArticleID=$1$2$3$4&Bloger=$5</SendTo>
</RewriterRule>
<RewriterRule>
<LookFor>~/Articles/(d{4})/(d{2})/(d{2})/(d{1,6})((/)?).aspx</ LookFor>
<SendTo>~/Default.aspx?ArticleID=$1$2$3$4</SendTo>
</RewriterRule>
<RewriterRule>
<LookFor>~/Articles/(d{4})/(d{2})/(d{2})/([A-Za-z]w{5,16}).aspx</ LookFor>
<SendTo>~/Default.aspx?ArticleID=$1$2$3&Bloger=$4</SendTo>
</RewriterRule>
<RewriterRule>
<LookFor>~/Articles/(d{4})/(d{2})/(d{2})((/)?).aspx</LookFor>
<SendTo>~/Default.aspx?ArticleID=$1$2$3</SendTo>
</RewriterRule>
<RewriterRule>
<LookFor>~/Articles/(d{4})/(d{2})/([A-Za-z]w{5,16}).aspx</LookFor>
<SendTo>~/Default.aspx?ArticleID=$1$2&Bloger=$3</SendTo>
</RewriterRule>
<RewriterRule>
<LookFor>~/Articles/(d{4})/(d{2})((/)?).aspx</LookFor>
<SendTo>~/Default.aspx?ArticleID=$1$2</SendTo>
</RewriterRule>
<RewriterRule>
<LookFor>~/Articles/(d{4})/([A-Za-z]w{5,16}).aspx</LookFor>
<SendTo>~/Default.aspx?ArticleID=$1&Bloger=$2</SendTo>
</RewriterRule>
<RewriterRule>
<LookFor>~/Articles/(d{4})((/)?).aspx</LookFor>
<SendTo>~/Default.aspx?ArticleID=$1</SendTo>
</RewriterRule>
<RewriterRule>
<LookFor>~/Articles/([A-Za-z]w{5,16}).aspx</LookFor>
<SendTo>~/Default.aspx?Bloger=$1</SendTo>
</RewriterRule>
<RewriterRule>
<LookFor>~/Articles/Default.aspx</LookFor>
<SendTo>~/Default.aspx?ArticleID=-1</SendTo>
</RewriterRule>
Note: In order to prevent users from accidentally deleting the important .aspx extension here, I use the user name as the virtual identification name of the file.

The final URL obtained: http://blog.tyoo.net/Articles/2007/01/18/015000/joshualang.aspx

At the same time, the maximum 6-digit time at the end is set to be deletable and longer, even if it is lost It is also most convenient to find the list closest to the publication time by using certain numbers.

Removing the username will not affect the use. You can still quickly get the most suitable list of articles you need through the detailed time format. If the user name is intact, the list of articles corresponding to the author in a specific time period can be found by deletion.

Also note: the format of the username ([A-Za-z]{6-16}) and the order of rule validation.

Summarize:
After passing the regular URL rewriting process, it will have a strict format as if this directory actually exists, but the framework structure will appear more convenient and flexible, thereby getting key improvements in functionality and user experience. Let’s stop here for now. If you have any questions, please reply and join the discussion. If there is a better way to rewrite it, I would be happy to let me know.
http://www.cnblogs.com/Joshualang/archive/2007/01/19/624302.html