-
robots.txt is the first file that search engine spiders view when they crawl your site. This file tells the search engine spider program which files on your site are allowed to be viewed and which are not. Nowadays, mainstream search engines still abide by this regulation. Hefei SEO has a deep understanding of this. This site unblocked the robots.txt file at around 11pm on the 20th to allow search engine spiders to access it. On the morning of the 21st, I checked and Baidu actually closed it. The second Google also accepted it.
Enough said, let’s get down to business. robots.txt is a text file. It must be named "robots.txt" and uploaded to the root directory of the site. Uploading to a subdirectory is invalid because search engine robots will only look for this in the root directory of your domain name. document. Again, there is no need to waste everyone’s time here if you don’t have the basic knowledge of Hefei SEO. You can go to Baidu Search Help Center and Google Chinese Administrator Blog. Here, we mainly introduce robots.txt from the perspective of SEO. role in the website optimization process.
1. Tips for using robots.txt that are beneficial to website optimization
1. Online website building provides a convenient way. When we resolve the domain name to the server, we can access the site, but at this time the site is not well laid out and the meta tags are still in a mess. The site at this time has been crawled and included by search engine spiders. If it is changed at that time, it will be very detrimental to SEO optimization. At this time, you can use the robots.txt file to set all search engine spiders not to allow querying all content of the site. Its syntax format is:
User-agent: *
Disallow: /
2. Customize search engine spiders to crawl specified content, allowing you to choose how to deal with search engines based on site conditions. There are two meanings here.
(1) Customized search engine. If you disdain what Du Niang does, you can make her just stare at you like this. Its syntax format is:
User-agent: baiduspider
Disallow: /
Note: Common search engine robot names.
Name search engine
Baiduspider http://www.baidu.com
Scooter http://www.altaVista.com
ia_archiver http://www.Alexa.com
Googlebot http://www.google.com
FAST-WebCrawler http://www.alltheweb.com
Slurp http://www.inktomi.com
MSNBOT http://search.msn.com
(2) Customize site content. That is to say, you can specify a directory to allow spiders to crawl, and a directory to prohibit spiders from crawling. For example, all search engine spiders are allowed to crawl the content under the directory abc, but are prohibited from crawling the content under the directory def. The syntax format is:
User-agent: *
Allow: /abc/
Disallow: /def/
3. Guide search engines to crawl website content. The most typical methods here are
(1) Guide spiders to crawl your site map. Its syntax format is:
User-agent: *
sitemap:sitemap-url
(2) Prevent spiders from crawling your website for duplicate content.
4. 404 error page problem. If your server customizes a 404 error page and does not configure a robots.txt file in the root directory of your site, search engine spiders will treat it as a robots.txt file, which will affect the inclusion of your website pages by search engines.
2. How to write robots.txt for website building using specific programs. These are just general ones, you have to decide according to your specific situation.
1. How to write the robots.txt file for DedeCMS website building
User-agent: *
Disallow: /plus/feedback_js.php
Disallow: /plus/feedback.php
Disallow: /plus/mytag_js.php
Disallow: /plus/rss.php
Disallow: /plus/search.php
Disallow: /plus/recommend.php
Disallow: /plus/stow.php
Disallow: /plus/count.php
Disallow: /include
Disallow: /templets
Disallow: /member
2. How to write the robots.txt file for WordPress website building
User-agent: *
Disallow: /wp-admin
Disallow: /wp-content/plugins
Disallow: /wp-content/themes
Disallow: /wp-includes
Disallow: /?s=
Sitemap: http://www.***.com/sitemap.xml
3. How to write the robots.txt file for phpcms website building
User-agent: *
Disallow: /admin
Disallow: /data
Disallow: /templates
Disallow: /include
Disallow: /languages
Disallow: /api
Disallow: /fckeditor
Disallow: /install
Disallow: /count.php
Disallow: /comment
Disallow: /guestbook
Disallow: /announce
Disallow: /member
Disallow: /message
Disallow: /spider
Disallow: /yp
Disallow: /vote
Disallow: /video
4. How to write the robots.txt file in discuz forum
User-agent: *
Allow: /redirect.php
Allow: /viewthread.php
Allow: /forumdisplay.php
Disallow: /?
Disallow: /*.php
Although the topic is old, there is still a lot to learn. Some people say that setting the robots.txt file will bring the risk of being attacked by "intentional people". As a grassroots forward, you have nothing to fear. Since you are an "intentional person", you have nothing to do (this is not only the website building program itself, but also the server security. etc etc). From Hefei SEO: http://www.anhuiseo.org Please indicate the source for reprinting.
Thanks to qhpf298 for his contribution