Robots files are a "gentleman's agreement" between the website and the spider program - robots files can not only save website resources, but also help spiders crawl the web more effectively, thereby improving rankings.
1: Only allow Google bot
If you want to block all crawlers except Googlebot:
User-agent:*
disallow:/
Uer-agent: allowed spider name
Disallow:
2: The difference between "/folder/" and "/folder"
For example:
User-agent:*
Disallow:/folder/
Disallow:/folder
"Disallow:/folder/" means that a directory is intercepted. All files in this directory are not allowed to be crawled, but folder.hlml is allowed to be crawled.
"Disallow:/folder": All files and folder.html under /folder/ cannot be crawled.
3: "*" matches any character
User-agent:*
Indicates to block all spiders. After we perform pseudo-static processing, there will be dynamic web pages and static web pages at the same time. The content of the web pages is exactly the same and is regarded as a mirror page. Therefore, we need to block the dynamic web pages. You can use the * sign to block the dynamic web pages.
User-agent:*
Disallow:/?*?/
4: $matching URL ends
If you want to intercept URLs that end with a certain string, you can use $. For example, if you want to intercept URLs that end with .asp:
User-agent:*
Disallow:/*.asp$
You can also open relatively good websites to see how their robots files are written, and then make corresponding modifications according to your own needs. Robots files can allow spiders to spend more time on the content to be crawled, so optimizing robots files is necessary.
This article comes from Dongyang Gaofu: http://mygaofu.com . Please indicate the link when reprinting.
Editor-in-Chief: Yangyang Author I love optimizing personal space