Ten summaries to teach you how to avoid spider traps

Author：Eve Cole Update Time：2011-06-08 18:01:48

Everyone who does SEO knows that the key is for the website to be included in the search engine. If your site is not included in the search engine, then SEO is out of the question. Therefore, if you want your website to be discovered by search engines and your web pages to be properly indexed by search engines, you must first make your site easy for spiders to crawl. The tools used by search engines to crawl web pages are called spiders or crawler robots, and their English name is robot. These spiders crawl many of our pages along the hyperlinks, but some pages cannot be crawled. Some of them are because the website itself has some problems that are not conducive to spider crawling, making it difficult for the corresponding pages to be indexed by search engines, thus forming a " "Spider trap" generally refers to website production technologies that are not friendly to search engines. These technologies are not conducive to spiders crawling our pages. In order to avoid this situation, Xiaowuming summarized the following factors to prevent spiders from entering the trap.

1: Pages that use session IDs. Some sales sites will use session IDs to track users in order to analyze certain user information. When accessing the site, each user will add a session ID to the URL. The same spider's Each visit will also be treated as a new user. Each time the spider visits the URL, a session id will be added. This will result in the same page but different URLs. This will result in a duplicate content page. , resulting in highly repetitive content pages, and is also one of the most common spider traps. For example, in order to improve sales performance, some websites conduct pop-up conversations, such as hello friends from XXX, etc.

2: Common-sense spider traps that use forced registration or login to access pages. This is quite difficult for spiders, because spiders cannot submit registration and cannot enter user names and passwords to log in to view content. For spiders, we The content you click to view directly is also the content that spiders can see.

3: I like sites that use Flash. The reason why I say I like sites that use Flash is because for some small and medium-sized enterprises, because Flash itself can do many kinds of effects, especially when it is used in navigation, it has a strong visual effect, so many enterprises Websites like to use flash to show their company's strength, culture, products, etc. Even the homepage of some corporate sites is a flash, either by jumping to another page through flash after a certain period of time, or by using flash. Links allow users to click to enter a new page, but it is difficult for spiders to read the content in flash, so it is also difficult for spiders to click links on flash.

4: Dynamic URL, adding too many symbols or URL parameters to the URL, etc. This kind of spider trap I have mentioned in URL optimization. Although with the technological development of search engines, dynamic URLs are less effective for spiders to crawl. It is becoming less and less of a problem, but in terms of search engine friendliness, static or even pseudo-static URLs are relatively better than dynamic URLs. You can take a look at how many SEO colleagues handle URLs.

5: Frames. In the early days, frames were widely used everywhere, but now many websites on frame web pages have rarely used them. First, with the development of major CMS systems, website maintenance has become relatively easier. Early websites The reason why frames are used is because it is convenient for maintaining website pages. It is no longer necessary. Moreover, it is not conducive to search engine inclusion and is one of the reasons why frames are used less and less.

6: JS, although search engines can now track and even try to disassemble and analyze links in JavaScript, we'd better not expect search engines to overcome difficulties on their own. Although some effective navigation can be done through JS, But CSS can also do it; in order to improve the friendliness of the website to search engines, it is recommended to make the web page better crawlable by spiders, try not to use js. Of course, in seo, one advantage of js is that the webmaster does not want the pages to be included. Or friendly links can use js. Another way to eliminate Javascript spider traps is to use the <noscript> tag. The <noscript> tag provides alternative code for browsers that do not support JavaScript. Spiders don't execute JavaScript, so they process <noscript> code instead.

7: Deep web pages, some of which have no entrance and are far away from the homepage of the website, are relatively difficult for spiders to crawl. Of course, it may be a different matter for those websites with high weight. For a website page to be included, it must first have a basic weight. The weight of the home page is generally the highest, and then the weight of the home page can be passed to the internal pages. When the weight of the internal pages rises to the threshold that can be included, the page will According to this theory, the weight transfer between pages will decrease. Therefore, the closer the click distance between the inner page and the home page is, the easier it is to get more weight transfer on the home page. A good website structure can allow more pages of the website to be included.

8: Forced use of cookies is equivalent to directly disabling cookies for search engines. Some websites will use forced cookies in order to achieve certain functions, such as tracking user access paths, remembering user information, or even stealing users. Privacy, etc., if the user does not enable cookies when visiting such sites, the page displayed will be abnormal, so the same web page cannot be accessed normally by spiders.

9: Various forms of jumps. I believe many SEO children are already very familiar with 301 redirects. However, spiders are very disgusted with other 302, or jumps such as meta refresh, javascript, flash, etc., and 301 is not a last resort. When used, any jump will cause obstacles to the spider's crawling to a certain extent, so you know.

10: Robots.txt writing errors and various cheating techniques, such as hidden text, hidden links, etc., using disguised web pages to display different web pages to determine whether the visitor is a spider or an ordinary browser, and using the default error 404 Pages, etc., will also cause crawling obstacles to spiders. Source of this article: Shenzhen SEO Website: http://www.zhsem.com/ Please respect the originality, please indicate when reprinting, thank you!

The personal space of the author Xiao Wuming