URL standardization has always been a problem that troubles webmasters and search engines. It is estimated that 10%-30% of URLs on the Internet are non-standardized URLs with the same content but different URLs. This creates several problems. for example:
For webmasters, the existence of multiple URLs disperses the weight of the page and is not conducive to ranking.
For search engines, it is a waste of resources and bandwidth.
When search engines find that multiple URLs have the same content, they will not punish them, but will try their best to find the URL that should be standardized. But after all, a program is just a program, and it may go wrong, and the one picked out may not be the standardized URL that the webmaster wants.
If the URL standardization problem on the website is too serious, it may also affect the inclusion. A domain name with low authority has a limited number of total pages that can be included. Search engines spend resources on including non-standard URLs, leaving fewer resources for truly different content.
There are many options to solve the problem of URL canonicalization. For example: set the version with 3W and without 3W in Google administrator tools, which one is the standardized version, use 301 redirection, redirect all non-standardized URLs to standardized URLs, and ensure the CMS system used Only generate canonical URLs, ensure that all internal links on the site point to canonical URLs, and specify all canonical URLs in the sitemap submitted to search engines, but these methods have their own limitations.
Google administrator tools are not applicable to other search engines. Some webmasters cannot do 301 redirects for some reason. In most cases, the CMS system is not under their control. They can control internal links, but other people’s links to their own websites cannot be controlled by them. It’s out of control. In short, although there are alternative solutions, URL standardization is still a big problem so far.
A few days ago, Google, Yahoo, and Microsoft jointly released a new tag canonical tag to solve the problem of URL canonicalization.
To put it simply, add this code to the head of the HTML file:
The meaning is that the standardized URL of this web page should be:
http://www.example.com/product.php?item=swedish-fish
This code can be added to the following URLs:
http://www.example.com/product.php?item=swedish-fish&category=gummy-candy
http://www.example.com/product.php?item=swedish-fish&trackingid=1234&sessionid=5678
The real canonical URLs of these URLs become:
http://www.example.com/product.php?item=swedish-fish
To put it simply, this label is equivalent to a 301 redirect within a page. The difference is that the user is not redirected, but remains on the same URL, and the search engine will treat it as a 301 redirect, which means that the weight of the page link will be concentrated on the standardized URL specified in the code.
There are also a few details that webmasters need to pay attention to:
This tag is just a suggestion or hint, not an instruction. It is not an instruction like the robots file. Therefore, search engines will consider this code to a large extent, but it is not 100%. They will also consider other situations to determine the canonical URL. This also prevents webmasters from getting the URL wrong.
This code can use either absolute or relative addresses. It is usually recommended to use absolute addresses to be safer. The content on the specified canonical URL may be slightly different from the content on other non-canonical URLs using this code, but may not be exactly the same. For example, there are many e-commerce websites that sort by price, color, and size in ascending and descending order. The generated URLs are all different, but the content is roughly the same. There are only minor differences, so you can use this tag.
The specified normalized URL can be a page that does not exist, returning 404, or a page that has not been included. But it is not recommended to do this, don’t look for trouble. This label applies to the same domain name, including second-level domain names. But it does not apply between different domain names to prevent someone from hijacking it. Don’t use this tag as a lifeline. First of all, you must have a good website structure and try to avoid URL normalization problems. This is only a last resort.
Sensitive people can probably see the opportunity to build a large number of external links from this new standard. Finally, this standard is supported by the three major search engines Google, Yahoo, and Microsoft. Why is Baidu not mentioned? I remember seeing reports that Baidu is the second largest search engine in the world in terms of search volume. Why not bring it with it? Shall we play together?
Author: Zac@SEO One post a day
Original: Dianshi Interactive Search Engine Optimization Blog