The Robots.txt file is a simple TXT text, but SEOers who focus on website construction and website optimization all know its importance. Its existence can block pages that you do not want search engines to crawl, or it can be like a picture. The map also guides the way for spiders. When a spider crawls to a site, the first thing it accesses is whether the Robots.txt file exists, and then it performs index access according to the guidelines in the content. If the file does not exist, it then accesses it sequentially according to the links in the page. Therefore, we can use it to block some directories that do not need to be indexed by search engines, or describe the site map in Robots.txt to guide spiders to crawl. This is very powerful for website security or saving server bandwidth and guiding indexing. It can be said that it has achieved the effect of promoting one's own strengths and avoiding one's own weaknesses. Let's do a detailed analysis below:
1. Use Robots.txt to save server bandwidth
Generally speaking, webmasters rarely make such a setting. However, when the server has a large number of visits and too much content, it is necessary to make a setting to save the server's bandwidth, such as blocking folders such as:image, which is useful for search engine indexing. It has no practical significance and wastes a lot of bandwidth. For a picture website, the consumption is even more staggering, so using Robots.txt can fully solve this problem.
2. Protect website security directory
Generally speaking, when setting up Robots.txt, the management directory, database, and backup directory must be set in to prevent spiders from crawling. Otherwise, data leakage may easily occur and affect the security of the website. Of course, there are other directories that the administrator does not want spiders to index, which can also be set, so that search engines can strictly abide by this rule for indexing.
3. Prohibit search engines from indexing the page
There are always some pages on a website that we don’t want the public to see. At this time, we can use Robots.txt to set them up to prevent spiders from indexing them. For example, a few days ago, my Internet speed was slow and I updated an article, resulting in continuous and repeated publishing. I searched 3 times and all the results were indexed by search engines. What should I do? Duplicate content is definitely not good for website optimization. At this time, you can set Robots.txt to block redundant pages.
4. Robots.txt link site map
Since the first thing a spider looks at when visiting a website is the Robots.txt file, we can set the site map in it, which will help spiders index the latest information and avoid many unnecessary mistakes. For example, the map page of the professional website construction company Pilot Technology is displayed: http://www.****.net.cn/ sitemap.xml. Adding it to Robots.txt is even more conducive to search engine indexing. You don’t have to go to search engines every day to submit map files. Isn’t it very convenient?
5. Writing methods and precautions
The writing style of Robots.txt must be standardized. There are many people who are careless in writing. First of all: User-agent:* must be written, * means for all search engines. Disallow: (file directory) does not include brackets, which means that search engine indexing is prohibited. Here is an example to illustrate:
Example 1:
User-agent:*
Disallow:/
Indicates that any search engine index access is prohibited,
Example 2:
User-agent:*
Disallow:/seebk
Indicates that search engine indexes are prohibited from accessing the seebk directory.
Example 3:
User-agent:baiduspider
User-agent:*
Disallow:/
Indicates that only Baidu spider stacked index access is allowed: Baidu: baiduspider, Google: googlebot, Soso: sosospider, Alexa: ia_archiver, Yahoo: Yahoo Slurp
Example 4:
User-agent:*
Disallow:.jpg$
To prevent hotlinking of jpg images, you don’t need to set it if your bandwidth is sufficient.
Postscript: Optimize Robots.txt to maximize your strengths and avoid your weaknesses. Doing a good job in Robots.txt will make it easier for the website to optimize and develop smoothly. This article was originally written by www.joyweb.net.cn !
Author binary network personal space