nlp-lang
##MAVEN
<dependencies>
<dependency>
<groupId>org.nlpcn</groupId>
<artifactId>nlp-lang</artifactId>
<version>1.7.6</version>
</dependency>
</dependencies>
This project is a basic package. It encapsulates most commonly used tools in nlp projects
tool
- √ Word standardization
- √ tire tree structure
- √ Double array tire tree
- √ Text segmentation
- √ html tag cleaning
- √ Viterbi algorithm added
components
- √ Convert Chinese characters to Pinyin
- √ Conversion between Simplified and Traditional Chinese
- √ bloomfilter
- √ Fingerprint deduplication
- √ SimHash article similarity calculation
- √ Word co-occurrence statistics
- √ Memory-based search prompts
- √ WordWeight word frequency statistics, word idf statistics, word category correlation statistics