NLP_pytorch_project Download - NLP_pytorch_project Source code download

Notice

Because there are currently too many projects in this warehouse, it is difficult to maintain. Therefore, various tasks will be separated in the future, warehouses will be established separately, and many comparative experiments will be conducted to facilitate users to select models. If necessary, you can jump to the specified warehouse, the link is as follows:

Text classification warehouse
Semantic matching warehouse
Text generation warehouse
Other repositories are speeding up updates...

Chatbot
- 1. Bert_chatbot: Similar to UniLM
- 2. seq2seq_luong: The encoder is a two-layer gru network, and the decoder is a one-layer gru network. Luong attention is added between the encoder and the decoder.
- 3. transformer_chatbot: standard transformer model
Distillation
- 1. DynaBert: Huawei's work mainly uses pruning to cut certain structures of bert.
- 2. rnn_distill_bert: Use a layer of lstm network to distill the bert model, and only add soft label loss.
- 3. three_layer_self-attention_to_distill_bert: You probably know what it means by looking at the name. It just writes the encoder of the three-layer transformer, and then distills the bert model.
- 4. tiny_bert: Huawei's work, the distillation method of tiny_bert is to add the mean square error loss of the middle layer in addition to the soft label loss.
Embedding
- 1. skipgram-word2vec: Use skipgram to get word vectors
- 2. bert: train bert directly, train from scratch, or use this code for retraining
- 3. albert: train albert directly and train from scratch. You can also use this code for retraining.
- 4. NPLM: Traditional approach
NER
- 1. Bert_CRF_Ner: Bert model plus conditional random field for sequence labeling tasks
- 2. Bert_Softmax_Ner: directly use the bert model for sequence annotation
- 3. BiLSTM_CRF_Ner: Use bidirectional lstm network and crf for sequence labeling tasks
NMT
- 1. GRU_attention: The encoder and decoder are both gru networks, with a common attention mechanism (direct weighted sum) added in the middle.
- 2. Transformers_NMT: Standard transformer structure for machine translation
Pretrain_Model
- 1. bert-pretrain: To retrain the bert model, first perform data preprocessing through get_train_data.py, including 15% of the words for mask and other operations, and then train.
- 2. wobert-pretrain: The pre-training model of wobert is given by Su Shen. For retraining here, you can add the word list you built yourself, and then modify the word segmentation method of bert.
Reading_comprehension
- 1. BERT_MRC: Use bert to do machine reading comprehension tasks. It is expected that the direct stage method will be adopted here.
- 2. BiDAF: Machine reading comprehension model with bidirectional attention flow mechanism
- 3. DocQA: traditional model
- 4. Match_LSTM: traditional model, simple rnn structure.
- 5. QANet: It is also a relatively traditional model, but this model is the first mrc model to abandon the rnn structure. This model is also the first to introduce the self-attention mechanism into the mrc task.
- 6. RNet: traditional model
- 7. Recurrence-hotpot-baseline: For the first time, the problem of using rnn structure to deal with multi-hop reasoning is generated. In the hotpotqa data set, in addition to the predictions containing answers, there are also predictions of supporting facts and predictions of related paragraphs.
- 8. albert_mrc: Use albert pre-trained model to do mrc tasks
- 9. electra_bert: Use electra pre-trained model to do mrc tasks
- 10. mrc_baseline: If you are doing mrc tasks, it is recommended to read this code first. It contains various details that mrc pays attention to, such as long text processing (sliding window), answer sorting, adversarial training, etc.
- 11. roberta_mrc: Use roberta pre-trained model to do mrc tasks
- 12. transformer+rnn+attention: This project is for generative reading comprehension, directly using the seq2seq method. The encoder uses the encoder of transformer, and the decoder uses the gru structure, and a layer of ordinary is added in the middle. attention mechanism.
- 13. transformer_reading: This project is also about generative reading comprehension, using the standard transformer structure.
Slot_Filling
- 1. JointBert: Involves intent classification and slot classification. Directly use bert to encode the input and use "CLS" vectors for intent classification. The final encoding vector of each token is used for slot classification.
Text_Classification
- 1. DPCNN: The deep convolutional network + residual connection makes this model better than previous CNN structures, and its complexity is not high.
- 2. FastBert: uses a self-distillation method to speed up model reasoning. Mainly used in classification tasks.
- 3. FastText: Proposed by Facebook, it is an efficient text classification model.
- 4. XLNet: 1) Learn bidirectional contextual information by maximizing the log-likelihood of all possible factorization orders; 2) Use the characteristics of autoregression itself to overcome the shortcomings of BERT. In addition, XLNet also incorporates the ideas of the current optimal autoregressive model Transformer-XL.
- 5. all_layer_out_concat: As you can see from the name, this project encodes the text through the bert-style model, then takes out the cls vector of each layer, performs an attention calculation, and then performs classification.
- 6. bert+bceloss+average_checkpoint: This project changes the loss function of the classification task to BCELoss. In addition, the added weight average (average multiple checkpoints)
- 7. capsule_text_classification: GRU+Capsule for text classification
- 8. longformer_classification: Use the pre-trained model longformer for text classification. For classification of long texts, you can try this model.
- 9. multi_label_classify_bert: Use the bert model for multi-label classification. It contains three models: bert (model.py), bert's last two layers of pooling (model2.py), and bert+TextCNN (model3.py).
- 10. roberta_classification: Use roberta pre-trained model for text classification.
- 11. transformer_xl: Use transformer_xl directly for text classification. For long text classification, you can try this model.
- 12. wobert+focal_loss: The wobert pre-training model is provided by Su Shen, and focal loss is added to the classification task to solve the problem of category imbalance.
- 13. TextCNN: Convolve the text at different scales and then concatenate it for text classification.
- 14. BILSTM+Attention: Bidirectional LSTM network plus ordinary attention for text classification.
Text_Clustering
- 1. LDA clustering
- 2.DBSCAN
- 3. Kmeans
Text_Corrector
- 1. bert_for_correction: It is just a simple attempt, retraining on the corresponding corpus, inputting a sentence with typos, and then classifying the encoding vector of each token.
Text_Generation
- 1. GPT2_SummaryGen: Use GPT2 to generate summary
- 2. GPT2_TitleGen: Generation of article titles
- 3. Simple-GPT2: Self-implemented GPT2 model
Text_Ranking
- 1. BM25: Calculate the BM25 value of the query and all texts to be sorted, and then sort based on this value.
- 2. DC_Bert_Ranking: Twin Towers + Interaction. First, query and context are encoded separately, and the weights here are not shared. Then the encoding of query and context is mixed, and then passed through several layers of interactive transformer-encoder.
- 3. DPR_Ranking: Facebook’s text ranking model
- 4. MT_Ranking: Use bert-style model for encoding, then use cls for classification, and sort by the score of the positive sample
- 5. ReRank: including model distillation
Text_Similarity
- 1. ABCNN: First perform word embedding on the two sentences, then perform pooling to obtain the vectors of the two sentences, then calculate the difference between the two vectors, and finally perform convolution on the difference vector at different scales, and then classify .
- 2. BiMPM: This model is divided into four parts: word embedding, context encoding (bilstm), four types of Matching, and Aggregation Layer.
- 3. DecomposableAttention: The core of this paper is alignment, that is, the correspondence between words. The alignment in the article is used in two places. One attend part is used to calculate the attention relationship between two sentences, and the other is In the compare part, the words between the two sentences are compared. Each processing is based on words, and finally a feed-forward neural network is used to make predictions. It is obvious that the model mentioned in this article does not use the temporal relationship of words in the sentence, but emphasizes the correspondence between the words in the two sentences (alignment).
- 4. ESIM: Short text matching tool. ESIM is awesome in its inter-sentence attention, which is the soft_align_attention in the code. In this step, the two sentences to be compared are interacted. In the past, structures similar to Siamese networks often had no interaction in the middle, and only found a cosine distance or other distance in the last layer.
- 5. RE2: The name RE2 comes from the combination of three important parts of the network: Residual vectors; Embedding vectors; Encoded vectors.
- 6. SiaGRU: Twin-tower structure, use GRU to encode two sentences respectively, then calculate the difference between the two sentence encoding vectors, and finally use this difference vector for classification.
- 7. SimCSE: Contrastive learning, skills: different samples, different Dropout
- 8. BM25: Directly calculate the BM25 value of two texts, representing their degree of similarity.
- 9. TF_IDF: Directly calculate the TF_IDF value of two texts, representing their similarity.
- 10. NEZHA_Coattention: It adopts a twin-tower structure. Two sentences are input into the NEZHA model respectively, and then the input is differentiated, spliced with the original representation, and then sent to a fully connected network for classification. There is another model, that is, after obtaining the representation of the two sentences, we implement a layer of transformer-encoder to fuse the representation information, and finally classify it.
- 11. Bert_Whitening: The method proposed by Su Jianlin does not require training and directly unifies the bert output of each sentence to the standard orthonormal basis.
data_augmentation
- 1. eda: Use the nlpcda toolkit for data augmentation. Such as: equivalent entity replacement, random synonym replacement, random deletion of characters, position exchange, homophone replacement.
- 2. Back translation-Baidu: Use Baidu Translate for back translation of text.
- 3. Back-translation-google: Use Google Translate for back-translation of text.
relation_extraction
- 1. lstm_cnn_information_extract: lstm+cnn
- 2. relation_classification: relation classification, bilstm+ordinary attention

NLP_pytorch_project

Chatbot

1. Bert_chatbot: Similar to UniLM

python train.py # training code
python infernece.py # Model inference

2. seq2seq_luong: The encoder is a two-layer gru network, and the decoder is a one-layer gru network. Luong attention is added between the encoder and the decoder.

python train.py # training code
python inference.py # Model inference

3. transformer_chatbot: standard transformer model

python train.py # training code
python chat.py # The training data that can be used for chatting is Qingyun dialogue material.

Distillation

1. DynaBert: Huawei's work mainly uses pruning to cut certain structures of bert.

python train_teacher_model.py # Train teacher model
python train_tailor_model.py # Prune the teacher model

2. rnn_distill_bert: Use a layer of lstm network to distill the bert model, and only add soft label loss.

python train_bert.py # Train teacher model bert
python train_distill.py # Distillation uses lstm to learn the output of bert

3. three_layer_self-attention_to_distill_bert: You probably know what it means by looking at the name. It just writes the encoder of the three-layer transformer, and then distills the bert model.

python train_bert.py # Train teacher model bert
python train_distill.py # Distillation

4. tiny_bert: Huawei's work, the distillation method of tiny_bert is to add the mean square error loss of the middle layer in addition to the soft label loss.

python train.py # Train teacher model bert
python train_distill_v2.py # Distillation

Embedding

1. skipgram-word2vec: Use skipgram to get word vectors

python 001-skipgram-word2vec.py

2. bert: train bert directly, train from scratch, or use this code for retraining

python 002-bert.py

3. albert: train albert directly and train from scratch. You can also use this code for retraining.

python 003-albert.py

4. NPLM: Traditional approach

python004-NPLM.py

NER

1. Bert_CRF_Ner: Bert model plus conditional random field for sequence labeling tasks

python run_ner_crf.py # Model training
python inference.py # Model inference

2. Bert_Softmax_Ner: directly use the bert model for sequence annotation

python train.py # Model training
python inference.py # Model inference

3. BiLSTM_CRF_Ner: Use bidirectional lstm network and crf for sequence labeling tasks

python train.py # Model training

NMT

1. GRU_attention: The encoder and decoder are both gru networks, with a common attention mechanism (direct weighted sum) added in the middle.

python train.py # Model training

2. Transformers_NMT: Standard transformer structure for machine translation

python train.py # Model training

Pretrain_Model

1. bert-pretrain: To retrain the bert model, first perform data preprocessing through get_train_data.py, including 15% of the words for mask and other operations, and then train.

python get_train_data.py #Data preprocessing
python run_pretrain.py # Retraining

2. wobert-pretrain: The pre-training model of wobert is given by Su Shen. For retraining here, you can add the word list you built yourself, and then modify the word segmentation method of bert.

python process_pretrain_data.py #Data preprocessing
python run_pretrain.py # Retraining

Reading_comprehension

1. BERT_MRC: Use bert to do machine reading comprehension tasks. It is expected that the direct stage method will be adopted here.

python train.py # Model training

2. BiDAF: Machine reading comprehension model with bidirectional attention flow mechanism

python data_process.py # First preprocess the data
python train_bidaf.py # Model training

3. DocQA: traditional model

python data_process.py # First preprocess the data
python train_DocQA.py # Model training

4. Match_LSTM: traditional model, simple rnn structure.

python data_process.py # First preprocess the data
python train_Match_Lstm.py # Model training

5. QANet: It is also a relatively traditional model, but this model is the first mrc model to abandon the rnn structure. This model is also the first to introduce the self-attention mechanism into the mrc task.

python data_process.py #Data preprocessing
python train.py # Model training

6. RNet: traditional model

python data_process.py #Data preprocessing
python train_RNet.py # Model training

7. Recurrence-hotpot-baseline: For the first time, the problem of using rnn structure to deal with multi-hop reasoning is generated. In the hotpotqa data set, in addition to the predictions containing answers, there are also predictions of supporting facts and predictions of related paragraphs.

python data_process.py #Data preprocessing
python train.py # Model training

8. albert_mrc: Use albert pre-trained model to do mrc tasks

python train_update.py # Training model
python inference.py #Inference on a single piece of data
python inference_all.py #Inference for all data

9. electra_bert: Use electra pre-trained model to do mrc tasks

python run_cail.py #Training model
python evaluate.py # Model evaluation

10. mrc_baseline: If you are doing mrc tasks, it is recommended to read this code first. It contains various details that mrc pays attention to, such as long text processing (sliding window), answer sorting, adversarial training, etc.

python train.py #train model

11. roberta_mrc: Use roberta pre-trained model to do mrc tasks

python train.py #train model

12. transformer+rnn+attention: This project is for generative reading comprehension, directly using the seq2seq method. The encoder uses the encoder of transformer, and the decoder uses the gru structure, and a layer of ordinary is added in the middle. attention mechanism.

python train.py # Model training
python inference.py # Model inference

13. transformer_reading: This project is also about generative reading comprehension, using the standard transformer structure.

python train.py # Model training
python inference.py # Model inference

Slot_Filling

1. JointBert: Involves intent classification and slot classification. Directly use bert to encode the input and use "CLS" vectors for intent classification. The final encoding vector of each token is used for slot classification.

python train.py # Training of the model

Text_Classification

1. DPCNN: The deep convolutional network + residual connection makes this model better than previous CNN structures, and its complexity is not high.

python get_data_to_examples.py # Preprocess data
python examples_to_features.py # Convert the corresponding examples to features
python train.py # Model training

2. FastBert: uses a self-distillation method to speed up model reasoning. Mainly used in classification tasks.

sh train_stage0.sh # Train teacher model bert
sh train_stage1.sh # self-distillation
sh infer_sigle.sh # Adaptive inference single sample

3. FastText: Proposed by Facebook, it is an efficient text classification model.

python step1_get_data_to_examples.py # Get data
python step2_examples_to_features.py # Convert text data into id sequence
python train.py # Model training

4. XLNet: 1) Learn bidirectional contextual information by maximizing the log-likelihood of all possible factorization orders; 2) Use the characteristics of autoregression itself to overcome the shortcomings of BERT. In addition, XLNet also incorporates the ideas of the current optimal autoregressive model Transformer-XL.

python train.py # Model training

5. all_layer_out_concat: As you can see from the name, this project encodes the text through the bert-style model, then takes out the cls vector of each layer, performs an attention calculation, and then performs classification.

python train.py # Model training
Python inference.py # Model inference

6. bert+bceloss+average_checkpoint: This project changes the loss function of the classification task to BCELoss. In addition, the added weight average (average multiple checkpoints)

python run_classify.py # Model training
python run_average_checkpoints.py # Weight average

7. capsule_text_classification: GRU+Capsule for text classification

python train.py # Model training

8. longformer_classification: Use the pre-trained model longformer for text classification. For classification of long texts, you can try this model.

python train.py # Model training

9. multi_label_classify_bert: Use the bert model for multi-label classification. It contains three models: bert (model.py), bert's last two layers of pooling (model2.py), and bert+TextCNN (model3.py).

python train.py # Model training
python inference.py # Model prediction

10. roberta_classification: Use roberta pre-trained model for text classification.

python train.py # Model training

11. transformer_xl: Use transformer_xl directly for text classification. For long text classification, you can try this model.

python train.py # Model training

12. wobert+focal_loss: The wobert pre-training model is provided by Su Shen, and focal loss is added to the classification task to solve the problem of category imbalance.

python run_classify.py # Model training

13. TextCNN: Convolve the text at different scales and then concatenate it for text classification.

python 001-TextCNN.py # Model training

14. BILSTM+Attention: Bidirectional LSTM network plus ordinary attention for text classification.

python 002-BILSTM+Attention.py # Model training

Text_Clustering

1. LDA clustering

python train_LDA_cluster.py # Clustering

2.DBSCAN

python train_dbscan_cluster.py # Clustering

3. Kmeans

python train_kmeans_cluster.py # Clustering

Text_Corrector

1. bert_for_correction: It is just a simple attempt, retraining on the corresponding corpus, inputting a sentence with typos, and then classifying the encoding vector of each token.

python run_pretrain_bert.py #Retraining
bert_corrector.py # Error correction

Text_Generation

1. GPT2_SummaryGen: Use GPT2 to generate summary

python train.py # Model training
python inferface.py # Model inference

2. GPT2_TitleGen: Generation of article titles

python train.py # Model training
python inference.py # Model inference

3. Simple-GPT2: Self-implemented GPT2 model

python train.py # Model training
python inference.py # Model inference

Text_Ranking

1. BM25: Calculate the BM25 value of the query and all texts to be sorted, and then sort based on this value.

python main.py # Sort

2. DC_Bert_Ranking: Twin Towers + Interaction. First, query and context are encoded separately, and the weights here are not shared. Then the encoding of query and context is mixed, and then passed through several layers of interactive transformer-encoder.

python train.py # Model training
python inference.py # Model inference

3. DPR_Ranking: Facebook’s text ranking model

python train.py # Model training

4. MT_Ranking: Use bert-style model for encoding, then use cls for classification, and sort by the score of the positive sample

python train.py # Model training
python inference.py # Model inference

5. ReRank: including model distillation

python train.py # Model training
python train_distill.py # Model distillation

Text_Similarity

1. ABCNN: First perform word embedding on the two sentences, then perform pooling to obtain the vectors of the two sentences, then calculate the difference between the two vectors, and finally perform convolution on the difference vector at different scales, and then classify .

python train.py # Model training

2. BiMPM: This model is divided into four parts: word embedding, context encoding (bilstm), four types of Matching, and Aggregation Layer.

python train.py # Model training

3. DecomposableAttention: The core of this paper is alignment, that is, the correspondence between words. The alignment in the article is used in two places. One is the attend part, which is used to calculate the attention relationship between two sentences, and the other is In the compare part, the words between the two sentences are compared. Each processing is based on words, and finally a feed-forward neural network is used to make predictions. It is obvious that the model mentioned in this article does not use the temporal relationship of words in the sentence, but emphasizes the correspondence between the words in the two sentences (alignment).

python train.py # Model training

4. ESIM: Short text matching tool. The great thing about ESIM is its inter-sentence attention, which is the soft_align_attention in the code. In this step, the two sentences to be compared are interacted. In previous structures similar to Siamese networks, there was often no interaction in the middle, and only a cosine distance or other distance was found at the last layer.

python train.py # Model training

5. RE2: The name RE2 comes from the combination of three important parts of the network: Residual vectors; Embedding vectors; Encoded vectors.

python train.py # Model training

6. SiaGRU: Twin-tower structure, use GRU to encode two sentences respectively, then calculate the difference between the two sentence encoding vectors, and finally use this difference vector for classification.

python train.py # Model training

7. SimCSE: Contrastive learning, skills: different samples, different Dropout

python train.py # Model training

8. BM25: Directly calculate the BM25 value of two texts, representing their degree of similarity.

pythonBM25.py

9. TF_IDF: Directly calculate the TF_IDF value of two texts, representing their degree of similarity.

pythonTF_IDF.py

10. NEZHA_Coattention: It adopts a twin-tower structure. Two sentences are input into the NEZHA model respectively, and then the input is differentiated, spliced with the original representation, and then sent to a fully connected network for classification. There is another model, that is, after obtaining the representation of the two sentences, we implement a layer of transformer-encoder ourselves to fuse the representation information, and finally classify it.

pythontrain.py

11. Bert_Whitening: The method proposed by Su Jianlin does not require training and directly unifies the bert output of each sentence to the standard orthonormal basis.

python run_bert_whitening.py # Directly verify the data set and calculate the Spearman coefficient

data_augmentation

1. eda: Use the nlpcda toolkit for data augmentation. Such as: equivalent entity replacement, random synonym replacement, random deletion of characters, position exchange, homophone replacement.

python 001-run_eda.py

2. Back translation-Baidu: Use Baidu Translate for back translation of text.

python 002-run_contrslate_data_aug.py

3. Back-translation-google: Use Google Translate for back-translation of text.

python 003-google_trans_data_aug.py

relation_extraction

1. lstm_cnn_information_extract: lstm+cnn

python train.py # Model training
python inference.py # Model inference

2. relation_classification: relation classification, bilstm+ordinary attention

python data_helper.py #Data preprocessing
python train.py # Model training

Star History

Expand

NLP_pytorch_project

Notice

Table of contents

NLP_pytorch_project

Chatbot

1. Bert_chatbot: Similar to UniLM

2. seq2seq_luong: The encoder is a two-layer gru network, and the decoder is a one-layer gru network. Luong attention is added between the encoder and the decoder.

3. transformer_chatbot: standard transformer model

Distillation

1. DynaBert: Huawei's work mainly uses pruning to cut certain structures of bert.

2. rnn_distill_bert: Use a layer of lstm network to distill the bert model, and only add soft label loss.

3. three_layer_self-attention_to_distill_bert: You probably know what it means by looking at the name. It just writes the encoder of the three-layer transformer, and then distills the bert model.

4. tiny_bert: Huawei's work, the distillation method of tiny_bert is to add the mean square error loss of the middle layer in addition to the soft label loss.

Embedding

1. skipgram-word2vec: Use skipgram to get word vectors

2. bert: train bert directly, train from scratch, or use this code for retraining

3. albert: train albert directly and train from scratch. You can also use this code for retraining.

4. NPLM: Traditional approach

NER

1. Bert_CRF_Ner: Bert model plus conditional random field for sequence labeling tasks

2. Bert_Softmax_Ner: directly use the bert model for sequence annotation

3. BiLSTM_CRF_Ner: Use bidirectional lstm network and crf for sequence labeling tasks

NMT

1. GRU_attention: The encoder and decoder are both gru networks, with a common attention mechanism (direct weighted sum) added in the middle.

2. Transformers_NMT: Standard transformer structure for machine translation

Pretrain_Model

1. bert-pretrain: To retrain the bert model, first perform data preprocessing through get_train_data.py, including 15% of the words for mask and other operations, and then train.

2. wobert-pretrain: The pre-training model of wobert is given by Su Shen. For retraining here, you can add the word list you built yourself, and then modify the word segmentation method of bert.

Reading_comprehension

1. BERT_MRC: Use bert to do machine reading comprehension tasks. It is expected that the direct stage method will be adopted here.

2. BiDAF: Machine reading comprehension model with bidirectional attention flow mechanism

3. DocQA: traditional model

4. Match_LSTM: traditional model, simple rnn structure.

5. QANet: It is also a relatively traditional model, but this model is the first mrc model to abandon the rnn structure. This model is also the first to introduce the self-attention mechanism into the mrc task.

6. RNet: traditional model

7. Recurrence-hotpot-baseline: For the first time, the problem of using rnn structure to deal with multi-hop reasoning is generated. In the hotpotqa data set, in addition to the predictions containing answers, there are also predictions of supporting facts and predictions of related paragraphs.

8. albert_mrc: Use albert pre-trained model to do mrc tasks

9. electra_bert: Use electra pre-trained model to do mrc tasks

10. mrc_baseline: If you are doing mrc tasks, it is recommended to read this code first. It contains various details that mrc pays attention to, such as long text processing (sliding window), answer sorting, adversarial training, etc.

11. roberta_mrc: Use roberta pre-trained model to do mrc tasks

12. transformer+rnn+attention: This project is for generative reading comprehension, directly using the seq2seq method. The encoder uses the encoder of transformer, and the decoder uses the gru structure, and a layer of ordinary is added in the middle. attention mechanism.

13. transformer_reading: This project is also about generative reading comprehension, using the standard transformer structure.

Slot_Filling

1. JointBert: Involves intent classification and slot classification. Directly use bert to encode the input and use "CLS" vectors for intent classification. The final encoding vector of each token is used for slot classification.

Text_Classification

1. DPCNN: The deep convolutional network + residual connection makes this model better than previous CNN structures, and its complexity is not high.

2. FastBert: uses a self-distillation method to speed up model reasoning. Mainly used in classification tasks.

3. FastText: Proposed by Facebook, it is an efficient text classification model.

5. all_layer_out_concat: As you can see from the name, this project encodes the text through the bert-style model, then takes out the cls vector of each layer, performs an attention calculation, and then performs classification.

6. bert+bceloss+average_checkpoint: This project changes the loss function of the classification task to BCELoss. In addition, the added weight average (average multiple checkpoints)

7. capsule_text_classification: GRU+Capsule for text classification

8. longformer_classification: Use the pre-trained model longformer for text classification. For classification of long texts, you can try this model.

9. multi_label_classify_bert: Use the bert model for multi-label classification. It contains three models: bert (model.py), bert's last two layers of pooling (model2.py), and bert+TextCNN (model3.py).

10. roberta_classification: Use roberta pre-trained model for text classification.

11. transformer_xl: Use transformer_xl directly for text classification. For long text classification, you can try this model.

12. wobert+focal_loss: The wobert pre-training model is provided by Su Shen, and focal loss is added to the classification task to solve the problem of category imbalance.

13. TextCNN: Convolve the text at different scales and then concatenate it for text classification.

14. BILSTM+Attention: Bidirectional LSTM network plus ordinary attention for text classification.

Text_Clustering

1. LDA clustering

2.DBSCAN

3. Kmeans

Text_Corrector

1. bert_for_correction: It is just a simple attempt, retraining on the corresponding corpus, inputting a sentence with typos, and then classifying the encoding vector of each token.

Text_Generation

1. GPT2_SummaryGen: Use GPT2 to generate summary

2. GPT2_TitleGen: Generation of article titles

3. Simple-GPT2: Self-implemented GPT2 model

Text_Ranking

1. BM25: Calculate the BM25 value of the query and all texts to be sorted, and then sort based on this value.

2. DC_Bert_Ranking: Twin Towers + Interaction. First, query and context are encoded separately, and the weights here are not shared. Then the encoding of query and context is mixed, and then passed through several layers of interactive transformer-encoder.

3. DPR_Ranking: Facebook’s text ranking model

4. MT_Ranking: Use bert-style model for encoding, then use cls for classification, and sort by the score of the positive sample

5. ReRank: including model distillation

Text_Similarity

1. ABCNN: First perform word embedding on the two sentences, then perform pooling to obtain the vectors of the two sentences, then calculate the difference between the two vectors, and finally perform convolution on the difference vector at different scales, and then classify .

2. BiMPM: This model is divided into four parts: word embedding, context encoding (bilstm), four types of Matching, and Aggregation Layer.

5. RE2: The name RE2 comes from the combination of three important parts of the network: Residual vectors; Embedding vectors; Encoded vectors.

6. SiaGRU: Twin-tower structure, use GRU to encode two sentences respectively, then calculate the difference between the two sentence encoding vectors, and finally use this difference vector for classification.

7. SimCSE: Contrastive learning, skills: different samples, different Dropout