Webmasters must not ignore the use of Robots

Author：Eve Cole Update Time：2012-02-25 15:17:34

I have always emphasized the optimization of details before. Yes, Baidu’s current requirements for websites are to see whether your details are well done. Codes, tags, etc. are details, so Robots are also part of the details of the website. Do it well for us. The website is of great help. There may be many new webmasters who don’t know what Robots is. Now I will tell you a few points about the operation of Robots.

1. The origin of Robots.txt

We must first understand that Robots is not a command or instruction. Robots is a third-party agreement between a website and a search engine. The content of the agreement is the content in Robots.txt. In the early days, it was used for privacy protection on websites. It exists in our A txt file in the root directory of the website.

2. The role of Robots.txt

When we launch our website, there will be many irresistible factors released by search engines, which will lead to an overall decline in the quality of our web pages and a poor impression of our website on search engines. The role of Robots is to block these irresistible factors. Factors prevent spiders from releasing them, so which pages should we block?

1. Block some pages without content: I will give you an example to make it clear, such as: registration page, login page, shopping page, posting page, message page, search home page, if you have a 404 error page, you should also block it.

2. Block duplicate pages: If we find that our website has two pages with the same content but different paths, we have to use Robots to block a page. The spider will still crawl it but will not release it. We can use Google Webmaster Tools to block the page. Directly check the number of blocked pages.

3. Block some dead link pages

We only need to block those pages with common characteristics. The fact that spiders cannot crawl it does not mean that spiders cannot crawl the address. Being able to crawl the address and whether it can be crawled are two different concepts. Of course, we can handle it. We do not need to block dead links. For example, we need to block dead links caused by our path that cannot be dealt with.

4. Block some longer paths: We can use Robots to block long paths that exceed the URL input box.

3. Use of Robots.txt

1. Creation of Robots.txt

Create a new notepad file locally, name it Robots.txt, and then put this file in our root directory, so that our Robots.txt is created. Some open source programs such as Dreamweaver come with Robots. , when we modify it, we only need to download it from the root directory.

2. Common grammar

The User-agent syntax is used to define search engine crawlers. Disallow means forbidden. Allow means allowed.

Let’s first get to know search engine crawlers, which are spiders or robots.

For Baidu spider, we write Baiduspider in Robots, and for Google robot, we write Googlebot.

Let's introduce the writing method. Our first line is to define the search engine first.

User-agent: Baiduspider (Special attention should be paid to the fact that when we write Robots, there must be a space after the colon. At the same time, if we want to define all search engines, we must use * instead of Baiduspider)

Disallow: /admin/

The meaning of this sentence is to tell Baidu Spider not to include the web pages in the admin folder of my website. If we remove the slash after admin, the meaning completely changes. It means telling Baidu Spider not to include the pages in the admin folder of my website. All web pages in the admin folder in my root directory.

Allow means allowed or not prohibited. Generally speaking, it will not be used alone. It will be used together with Disallow. The purpose of using it together is to facilitate directory shielding and flexible applications, and to reduce the use of code. For example, let’s take / There are 100,000 files in the SEO/ folder, and there are two files that need to be crawled. We can't write tens of thousands of codes, which will be very tiring. We only need a few lines to work together.

User-agent: *(defines all search engines)

Disallow: /seo/ (disable the inclusion of seo folders)

Allow: /seo/ccc.php

Allow: /seo/ab.html

Allowing these two files to be captured and included at the same time, so we can solve the problem with four lines of code. Some people may ask whether it is more standardized to put Disallow in the front or Allow in the front, or whether Disallow is placed in the front.

This article was originally created by http://www.51diaoche.net. Reprinting is welcome. Please indicate the original author.

Editor-in-Chief: Yangyang Author Longfeng Hoisting Machinery’s personal space