Question restatement: There is a thesaurus containing about 400,000 commonly used words. Now given an article, use this thesaurus to analyze the number of occurrences of common words, and sort these words from high to low by the number of occurrences.
Ideas for improving the algorithm:
1. Usually an article contains far less than the 400,000 words in the database;
2. After the database is indexed, the "dichotomy method" can be used to quickly locate words;
3. Narrow the query range word by word. If the range is already 0 when a certain character is queried, it can be predicted that the following words will definitely not exist. (For example, when forest is queried, there is no matching word, so it can end here. ).
Expand