Can perform word segmentation on Chinese and English sentences with mixed full-width and half-width punctuation. You can choose the maximum word length of a phrase, the minimum word length of a punctuation sentence, whether to retain a single word in the word segmentation result, whether to retain punctuation marks, and other functions. For more detailed instructions, please view the Readme.txt in the download package.
A sqlite dictionary file is provided by default. If your virtual host does not support sqlite, you can import it into mysql or create other dictionaries yourself.
Due to my limited abilities, I may not be able to satisfy everyone in terms of efficiency, so please include more information.
The mounting, unloading and querying of the dictionary have separated functions, so it should be easy to modify. The core word segmentation algorithm only needs findinDict to return a true or false to tell me whether the word is in the dictionary.
Another thing to note is that the extension of mbstring is required. There is no way. Chinese, English, full-width and half-width are mixed together to segment words. It is very difficult to calculate the sentence length without using mbstring.
The program is provided as an extension of ThinkPHP by default, but you can remove the extends Base and use word segmentation directly. Apache2 open source agreement, SO, it doesn't matter if it is used for commercial closed source, as long as you don't dislike my program
Expand