This is a java implementation of Chinese word segmentation based on n-Gram+CRF+HMM.
The word segmentation speed reaches about 2 million words per second (tested on mac air), and the accuracy can reach more than 96%.
Currently, functions such as Chinese word segmentation, Chinese name recognition, user-defined dictionaries, keyword extraction, automatic summarization, and keyword tagging have been implemented.
It can be applied to natural language processing and other aspects, and is suitable for various projects that require high word segmentation effects.
<dependency>
<groupId>org.ansj</groupId>
<artifactId>ansj_seg</artifactId>
<version>5.1.1</version>
</dependency>
If you download for the first time and just want to test the test effect, you can call this simple interface
String str = "欢迎使用ansj_seg,(ansj中文分词)在这里如果你遇到什么问题都可以联系我.我一定尽我所能.帮助大家.ansj_seg更快,更准,更自由!" ;
System.out.println(ToAnalysis.parse(str));
欢迎/v,使用/v,ansj/en,_,seg/en,,,(,ansj/en,中文/nz,分词/n,),在/p,这里/r,如果/c,你/r,遇到/v,什么/r,问题/n,都/d,可以/v,联系/v,我/r,./m,我/r,一定/d,尽我所能/l,./m,帮助/v,大家/r,./m,ansj/en,_,seg/en,更快/d,,,更/d,准/a,,,更/d,自由/a,!
I have been thinking about it for a long time, no matter if anyone can help me. I'll write it down, if you're interested or enthusiastic, you can contact me.
时间识别
, IP地址识别
,邮箱识别
,网址识别
,词性识别
, etc...