I didn't pay much attention to the word segmentation research with Baidu, but once when I was optimizing, I accidentally discovered that a word was missing by one character, and the resulting rankings were very different.
What I do is keywords for second-hand houses, but the keyword setting on my page is "second-hand houses." Some friends may say, there is nothing wrong with this. "Second-hand houses" does not include second-hand houses. Keyword? If you haven't carefully studied Baidu's word segmentation, you may not be able to see the difference between the two words. However, if you pay a little attention to the search results, you can see the clues. Baidu is comparing "second-hand houses" and "second-hand houses". The two words "second-hand housing" are segmented differently. Baidu has established its own vocabulary, so it will treat the word "second-hand housing" as a whole, but for the word "second-hand housing", Baidu split it into two words: "second-hand" and "housing". Naturally, others cannot find my page when searching for the keyword "second-hand housing". Through this small detail, I feel it is necessary to conduct an in-depth study of Baidu's word segmentation. I roughly summarized the following points:
1. Baidu word segmentation is based on the first occurrence of keyword-related words in the content. For example, if the word "Today's new hot blood Jianghu sf" appears first in your text, the keywords on your page will be split into "today" and "newly opened hot blood" Jianghu sf "two words, the title must contain the keyword, but it does not have to be a complete match, but the keywords appearing in the content must completely match the Baidu word segmentation, and in the complete match, it will be based on the depth of the file URL path To sort, when the keywords are completely matched, for example, directories have priority than files, files in the root directory have priority than files in the secondary directory, and those that match completely will be ranked first, and then is a partial match.
2. When the keywords are not completely matched, if there is a word segmentation, for example: the browser downloads this keyword, the first keyword that appears in a web page is the browser, and it has a high keyword density. However, there is no keyword "download" in this webpage and the keyword that appears for the first time in another webpage is download. Then the keyword of this webpage will be split into two words: browser download. Although the second word The web page contains "browser" and "download", but the first web page will still be ranked in front of the second web page, which shows that the first part of the keyword is the most important.
3. The frequency of the first part of a keyword is the key to ranking. For example, "browser download", if the two web pages do not completely match and both contain two participles, then the web page "browser" with a high density of participles will Will be at the front.
4. If there is a complete match, but the keyword appears first in the last part of the page content, then the ranking of this web page will be lower than that of the previous pages. So it is very important for keywords to appear as early as possible in the content.
5. Baidu cuts words based on the first relevant keyword that appears. If the first relevant keyword that appears is the tail of the keyword, then it will start cutting from the back. If it is the front part, it will start cutting from the front, that is, based on The web content is divided into words in the order and reverse order. In order, the first half of the keyword is used as the starting point, and in reverse order, the second half of the keyword is used as the starting point. For example: "Today's newly opened hot-blooded Jianghu sf" is the keyword. If the keyword that appears for the first time in your webpage is "hot-blooded Jianghu sf", then the keywords on your page will be split into "hot-blooded Jianghu sf". and the words "newly opened today".
6. You can choose a keyword header that is easier to use according to Baidu's word segmentation principle (that is, adjust the keyword that appears for the first time in your web page content). That is to say, when artificially segmenting words, Baidu will judge from front to back, and will also cut from back to front.
7. If there are repetitions in the first half and the second half of the word segmentation, the ones with repetitions will be ranked lower than those without repetitions. But if there are repetitions, then it will be judged based on the density of the first half. For example: If it is cut into "Newly opened today's hot-blooded Jianghu sf" and it is cut into "Today's newly-opened hot-blooded Jianghu | Newly opened today's hot-blooded Jianghu sf" (the first half of the word cut out in this way is too long, so the ranking is unfavorable), then the ranking will definitely be higher than if it is cut into "Today's newly opened hot-blooded Jianghu | Newly opened hot-blooded Jianghusf" The ranking of Jianghu|sf is poor
8. If the keywords do not completely match and the keywords appear incomplete, for example: Today's new hot-blooded Jianghu SF, if the content of the web page does not contain the word "today" in the keywords, then the word will be cut. It will start from "newly opened", but such web pages will be ranked relatively low, because the first part of your keywords does not contain
9. In the case of missing words, if compared with web pages that do not lack words, they should still be sorted according to the density of the part before the word is segmented, that is to say, according to the order of word segmentation, if the words are cut out, the first part of the word will be sorted. Density ratio The density ratio of the back part is the key. For example, in a web page, the ratio of keywords in the front part to the back part is 1:2, and the ratio of another web page is 1:4, then of course the front web page should rank higher. Also in the case of missing words, short words before cutting out will have an advantage in ranking.
10. If there is no lack of words, but the latter part of the keyword appears earlier than the previous part, for example, "Today's new hot-blooded Jianghu SF" appears first, but the density of the word "Today's new hot-blooded Jianghu" is not high, Then the ranking will be lower than those with missing words.
11. The following words also appear in front, but the ratio of the preceding words to the following words is the key. For example, if a web page contains two words such as "sf" and "Today's new hot-blooded Jianghu", the ratio is 1:1 to another web page. Contains "Newly Opened Hot Blood Jianghu" "sf" "Newly Opened Today" The ratio is 2:1:1, then the previous webpage has an advantage in ranking. The number of keywords is not the key, the position where they appear, and the ratio of word segments are Very critical. The later participles account for more proportions, the more unfavorable the ranking will be.
12. It is also unfavorable for keywords to appear too late in the text, and it is also unfavorable for the density of the main words to be too low. The previous web pages were compared when the density of the main words was similar.
That’s all I’ve researched. I don’t know if you can understand it. If you can, you can adjust the keyword weight design of your webpage based on Baidu’s word segmentation, so that you can avoid hot words. Competition, but still able to make hot words. I hope you can communicate more with Xiaotuo, and finally thank you for sharing.
If reprinted, please indicate the source: Internet promotion plan-website optimization method-Xiaotuo website promotion blog
Original address of this article: http://www.xiaotuo.net/seoyouhua/24/
Thanks for Xiaotuo’s contribution