A brief analysis of search engine principles: file matching and initial subset screening

Author：Eve Cole Update Time：2011-03-22 18:21:49

File matching and initial subset selection are two very important aspects in the ranking process of search engines. Today I will give you some basic summaries on these two aspects. Although these seem to have nothing to do with us, everyone should learn more about some basic principles, which will have certain guiding significance for future website construction and optimization. Of course, these are just some of my own summaries. If there are any mistakes, I hope that Please make corrections.

After the search engine goes through the first few stages of preprocessing, what the search engine gets is a set of keywords in word units. Before this, the search engine first obtained one file corresponding to multiple keywords, but such query efficiency was too low and unrealistic, so the search engine would reversely map these files, and the result was one keyword corresponding to multiple keywords. document. In this way, when the user searches for a certain keyword, calculation and matching are performed in all files corresponding to the keyword, and the best search results are returned to the user. After understanding this general process, let’s start sharing today’s two main aspects.

The first is file matching: Search engine spiders are crawling and grabbing all the time, and they are constantly sorting, summarizing and storing the captured data. These processes are not performed when the user searches, but are pre-processed before and after the search. When the user searches for a certain keyword, the search engine only searches in its own database, rather than searching the Internet in real time. Search on all websites. In order to express it more clearly, I will explain it to you with a simple diagram:

This picture is a typical inverted index fast matching file table. When the user searches for "keyword 1 keyword 16", the search engine will perform simple calculations and matching in all files corresponding to these two words, and find the files that contain Keyword 1 also contains all pages of keyword 16.

The second is the initial subset screening: the subset is to meet the needs of users more quickly. The search engine needs to select from all relevant pages, and only calculates the pages with a slightly higher weight and returns them to the user. This process is often referred to as the initial subset. Set filtering. You can imagine that when we search for a certain keyword, the number of pages containing this keyword is often huge, even hundreds of thousands or millions. If a search engine matches from such a large amount of data, it will obviously take longer. In order to better meet the needs of users, in practice search engines will only select pages with high weight to match, but what kind of pages have high weight? , to meet the conditions of search engines? This includes the influence of many aspects of content and page-related elements, both external and internal factors. This issue is not a focus of this article's summary. I will share it with you slowly in future articles.

Usually when we search, it is impossible to look at all the search results one by one. Generally, we only look at the first few pages or even the top few. Although there are many related results returned by the search engine, these results are still qualified on the Internet. It is a small part of many web pages. Therefore, users’ search habits are changing, and search engines are also facing great challenges. How to better help users search for the information they need is always what search engines are trying to do. thing.

At this point, I have shared with you some basic principles of search engines through file matching and initial subset screening. Of course, there are many more things that need to be technically involved, and all aspects to be considered are more comprehensive and complex. These are just I have summarized the general principles for everyone. By understanding all aspects of search engines, it can play a certain guiding role in the construction of our website and search engine optimization.

Okay, that’s it for this article. I will continue to summarize and share it with you in the future. This article comes from: Beijing SEO, website: http://www.seostudy.org/ , please retain the copyright for reprinting, thank you!

Thanks to Beijing SEO for your contribution