Notice
Because there are currently too many projects in this warehouse, it is difficult to maintain. Therefore, various tasks will be separated in the future, warehouses will be established separately, and many comparative experiments will be conducted to facilitate users to select models. If necessary, you can jump to the specified warehouse, the link is as follows:
- Text classification warehouse
- Semantic matching warehouse
- Text generation warehouse
- Other repositories are speeding up updates...
Table of contents
- Chatbot
- 1. Bert_chatbot: Similar to UniLM
- 2. seq2seq_luong: The encoder is a two-layer gru network, and the decoder is a one-layer gru network. Luong attention is added between the encoder and the decoder.
- 3. transformer_chatbot: standard transformer model
- Distillation
- 1. DynaBert: Huawei's work mainly uses pruning to cut certain structures of bert.
- 2. rnn_distill_bert: Use a layer of lstm network to distill the bert model, and only add soft label loss.
- 3. three_layer_self-attention_to_distill_bert: You probably know what it means by looking at the name. It just writes the encoder of the three-layer transformer, and then distills the bert model.
- 4. tiny_bert: Huawei's work, the distillation method of tiny_bert is to add the mean square error loss of the middle layer in addition to the soft label loss.
- Embedding
- 1. skipgram-word2vec: Use skipgram to get word vectors
- 2. bert: train bert directly, train from scratch, or use this code for retraining
- 3. albert: train albert directly and train from scratch. You can also use this code for retraining.
- 4. NPLM: Traditional approach
- NER
- 1. Bert_CRF_Ner: Bert model plus conditional random field for sequence labeling tasks
- 2. Bert_Softmax_Ner: directly use the bert model for sequence annotation
- 3. BiLSTM_CRF_Ner: Use bidirectional lstm network and crf for sequence labeling tasks
- NMT
- 1. GRU_attention: The encoder and decoder are both gru networks, with a common attention mechanism (direct weighted sum) added in the middle.
- 2. Transformers_NMT: Standard transformer structure for machine translation
- Pretrain_Model
- 1. bert-pretrain: To retrain the bert model, first perform data preprocessing through get_train_data.py, including 15% of the words for mask and other operations, and then train.
- 2. wobert-pretrain: The pre-training model of wobert is given by Su Shen. For retraining here, you can add the word list you built yourself, and then modify the word segmentation method of bert.
- Reading_comprehension
- 1. BERT_MRC: Use bert to do machine reading comprehension tasks. It is expected that the direct stage method will be adopted here.
- 2. BiDAF: Machine reading comprehension model with bidirectional attention flow mechanism
- 3. DocQA: traditional model
- 4. Match_LSTM: traditional model, simple rnn structure.
- 5. QANet: It is also a relatively traditional model, but this model is the first mrc model to abandon the rnn structure. This model is also the first to introduce the self-attention mechanism into the mrc task.
- 6. RNet: traditional model
- 7. Recurrence-hotpot-baseline: For the first time, the problem of using rnn structure to deal with multi-hop reasoning is generated. In the hotpotqa data set, in addition to the predictions containing answers, there are also predictions of supporting facts and predictions of related paragraphs.
- 8. albert_mrc: Use albert pre-trained model to do mrc tasks
- 9. electra_bert: Use electra pre-trained model to do mrc tasks
- 10. mrc_baseline: If you are doing mrc tasks, it is recommended to read this code first. It contains various details that mrc pays attention to, such as long text processing (sliding window), answer sorting, adversarial training, etc.
- 11. roberta_mrc: Use roberta pre-trained model to do mrc tasks
- 12. transformer+rnn+attention: This project is for generative reading comprehension, directly using the seq2seq method. The encoder uses the encoder of transformer, and the decoder uses the gru structure, and a layer of ordinary is added in the middle. attention mechanism.
- 13. transformer_reading: This project is also about generative reading comprehension, using the standard transformer structure.
- Slot_Filling
- 1. JointBert: Involves intent classification and slot classification. Directly use bert to encode the input and use "CLS" vectors for intent classification. The final encoding vector of each token is used for slot classification.
- Text_Classification
- 1. DPCNN: The deep convolutional network + residual connection makes this model better than previous CNN structures, and its complexity is not high.
- 2. FastBert: uses a self-distillation method to speed up model reasoning. Mainly used in classification tasks.
- 3. FastText: Proposed by Facebook, it is an efficient text classification model.
- 4. XLNet: 1) Learn bidirectional contextual information by maximizing the log-likelihood of all possible factorization orders; 2) Use the characteristics of autoregression itself to overcome the shortcomings of BERT. In addition, XLNet also incorporates the ideas of the current optimal autoregressive model Transformer-XL.
- 5. all_layer_out_concat: As you can see from the name, this project encodes the text through the bert-style model, then takes out the cls vector of each layer, performs an attention calculation, and then performs classification.
- 6. bert+bceloss+average_checkpoint: This project changes the loss function of the classification task to BCELoss. In addition, the added weight average (average multiple checkpoints)
- 7. capsule_text_classification: GRU+Capsule for text classification
- 8. longformer_classification: Use the pre-trained model longformer for text classification. For classification of long texts, you can try this model.
- 9. multi_label_classify_bert: Use the bert model for multi-label classification. It contains three models: bert (model.py), bert's last two layers of pooling (model2.py), and bert+TextCNN (model3.py).
- 10. roberta_classification: Use roberta pre-trained model for text classification.
- 11. transformer_xl: Use transformer_xl directly for text classification. For long text classification, you can try this model.
- 12. wobert+focal_loss: The wobert pre-training model is provided by Su Shen, and focal loss is added to the classification task to solve the problem of category imbalance.
- 13. TextCNN: Convolve the text at different scales and then concatenate it for text classification.
- 14. BILSTM+Attention: Bidirectional LSTM network plus ordinary attention for text classification.
- Text_Clustering
- 1. LDA clustering
- 2.DBSCAN
- 3. Kmeans
- Text_Corrector
- 1. bert_for_correction: It is just a simple attempt, retraining on the corresponding corpus, inputting a sentence with typos, and then classifying the encoding vector of each token.
- Text_Generation
- 1. GPT2_SummaryGen: Use GPT2 to generate summary
- 2. GPT2_TitleGen: Generation of article titles
- 3. Simple-GPT2: Self-implemented GPT2 model
- Text_Ranking
- 1. BM25: Calculate the BM25 value of the query and all texts to be sorted, and then sort based on this value.
- 2. DC_Bert_Ranking: Twin Towers + Interaction. First, query and context are encoded separately, and the weights here are not shared. Then the encoding of query and context is mixed, and then passed through several layers of interactive transformer-encoder.
- 3. DPR_Ranking: Facebook’s text ranking model
- 4. MT_Ranking: Use bert-style model for encoding, then use cls for classification, and sort by the score of the positive sample
- 5. ReRank: including model distillation
- Text_Similarity
- 1. ABCNN: First perform word embedding on the two sentences, then perform pooling to obtain the vectors of the two sentences, then calculate the difference between the two vectors, and finally perform convolution on the difference vector at different scales, and then classify .
- 2. BiMPM: This model is divided into four parts: word embedding, context encoding (bilstm), four types of Matching, and Aggregation Layer.
- 3. DecomposableAttention: The core of this paper is alignment, that is, the correspondence between words. The alignment in the article is used in two places. One attend part is used to calculate the attention relationship between two sentences, and the other is In the compare part, the words between the two sentences are compared. Each processing is based on words, and finally a feed-forward neural network is used to make predictions. It is obvious that the model mentioned in this article does not use the temporal relationship of words in the sentence, but emphasizes the correspondence between the words in the two sentences (alignment).
- 4. ESIM: Short text matching tool. ESIM is awesome in its inter-sentence attention, which is the soft_align_attention in the code. In this step, the two sentences to be compared are interacted. In the past, structures similar to Siamese networks often had no interaction in the middle, and only found a cosine distance or other distance in the last layer.
- 5. RE2: The name RE2 comes from the combination of three important parts of the network: Residual vectors; Embedding vectors; Encoded vectors.
- 6. SiaGRU: Twin-tower structure, use GRU to encode two sentences respectively, then calculate the difference between the two sentence encoding vectors, and finally use this difference vector for classification.
- 7. SimCSE: Contrastive learning, skills: different samples, different Dropout
- 8. BM25: Directly calculate the BM25 value of two texts, representing their degree of similarity.
- 9. TF_IDF: Directly calculate the TF_IDF value of two texts, representing their similarity.
- 10. NEZHA_Coattention: It adopts a twin-tower structure. Two sentences are input into the NEZHA model respectively, and then the input is differentiated, spliced with the original representation, and then sent to a fully connected network for classification. There is another model, that is, after obtaining the representation of the two sentences, we implement a layer of transformer-encoder to fuse the representation information, and finally classify it.
- 11. Bert_Whitening: The method proposed by Su Jianlin does not require training and directly unifies the bert output of each sentence to the standard orthonormal basis.
- data_augmentation
- 1. eda: Use the nlpcda toolkit for data augmentation. Such as: equivalent entity replacement, random synonym replacement, random deletion of characters, position exchange, homophone replacement.
- 2. Back translation-Baidu: Use Baidu Translate for back translation of text.
- 3. Back-translation-google: Use Google Translate for back-translation of text.
- relation_extraction
- 1. lstm_cnn_information_extract: lstm+cnn
- 2. relation_classification: relation classification, bilstm+ordinary attention
NLP_pytorch_project
Chatbot
1. Bert_chatbot: Similar to UniLM
- python train.py # training code
- python infernece.py # Model inference
2. seq2seq_luong: The encoder is a two-layer gru network, and the decoder is a one-layer gru network. Luong attention is added between the encoder and the decoder.
- python train.py # training code
- python inference.py # Model inference
3. transformer_chatbot: standard transformer model
- python train.py # training code
- python chat.py # The training data that can be used for chatting is Qingyun dialogue material.
Distillation
1. DynaBert: Huawei's work mainly uses pruning to cut certain structures of bert.
- python train_teacher_model.py # Train teacher model
- python train_tailor_model.py # Prune the teacher model
2. rnn_distill_bert: Use a layer of lstm network to distill the bert model, and only add soft label loss.
- python train_bert.py # Train teacher model bert
- python train_distill.py # Distillation uses lstm to learn the output of bert
3. three_layer_self-attention_to_distill_bert: You probably know what it means by looking at the name. It just writes the encoder of the three-layer transformer, and then distills the bert model.
- python train_bert.py # Train teacher model bert
- python train_distill.py # Distillation
4. tiny_bert: Huawei's work, the distillation method of tiny_bert is to add the mean square error loss of the middle layer in addition to the soft label loss.
- python train.py # Train teacher model bert
- python train_distill_v2.py # Distillation
Embedding
1. skipgram-word2vec: Use skipgram to get word vectors
- python 001-skipgram-word2vec.py
2. bert: train bert directly, train from scratch, or use this code for retraining
3. albert: train albert directly and train from scratch. You can also use this code for retraining.
4. NPLM: Traditional approach
NER
1. Bert_CRF_Ner: Bert model plus conditional random field for sequence labeling tasks
- python run_ner_crf.py # Model training
- python inference.py # Model inference
2. Bert_Softmax_Ner: directly use the bert model for sequence annotation
- python train.py # Model training
- python inference.py # Model inference
3. BiLSTM_CRF_Ner: Use bidirectional lstm network and crf for sequence labeling tasks
- python train.py # Model training
NMT
1. GRU_attention: The encoder and decoder are both gru networks, with a common attention mechanism (direct weighted sum) added in the middle.
- python train.py # Model training
2. Transformers_NMT: Standard transformer structure for machine translation
- python train.py # Model training
Pretrain_Model
1. bert-pretrain: To retrain the bert model, first perform data preprocessing through get_train_data.py, including 15% of the words for mask and other operations, and then train.
- python get_train_data.py #Data preprocessing
- python run_pretrain.py # Retraining
2. wobert-pretrain: The pre-training model of wobert is given by Su Shen. For retraining here, you can add the word list you built yourself, and then modify the word segmentation method of bert.
- python process_pretrain_data.py #Data preprocessing
- python run_pretrain.py # Retraining
Reading_comprehension
1. BERT_MRC: Use bert to do machine reading comprehension tasks. It is expected that the direct stage method will be adopted here.
- python train.py # Model training
2. BiDAF: Machine reading comprehension model with bidirectional attention flow mechanism
- python data_process.py # First preprocess the data
- python train_bidaf.py # Model training
3. DocQA: traditional model
- python data_process.py # First preprocess the data
- python train_DocQA.py # Model training
4. Match_LSTM: traditional model, simple rnn structure.
- python data_process.py # First preprocess the data
- python train_Match_Lstm.py # Model training
5. QANet: It is also a relatively traditional model, but this model is the first mrc model to abandon the rnn structure. This model is also the first to introduce the self-attention mechanism into the mrc task.
- python data_process.py #Data preprocessing
- python train.py # Model training
6. RNet: traditional model
- python data_process.py #Data preprocessing
- python train_RNet.py # Model training
7. Recurrence-hotpot-baseline: For the first time, the problem of using rnn structure to deal with multi-hop reasoning is generated. In the hotpotqa data set, in addition to the predictions containing answers, there are also predictions of supporting facts and predictions of related paragraphs.
- python data_process.py #Data preprocessing
- python train.py # Model training
8. albert_mrc: Use albert pre-trained model to do mrc tasks
- python train_update.py # Training model
- python inference.py #Inference on a single piece of data
- python inference_all.py #Inference for all data
9. electra_bert: Use electra pre-trained model to do mrc tasks
- python run_cail.py #Training model
- python evaluate.py # Model evaluation
10. mrc_baseline: If you are doing mrc tasks, it is recommended to read this code first. It contains various details that mrc pays attention to, such as long text processing (sliding window), answer sorting, adversarial training, etc.
- python train.py #train model
11. roberta_mrc: Use roberta pre-trained model to do mrc tasks
- python train.py #train model
12. transformer+rnn+attention: This project is for generative reading comprehension, directly using the seq2seq method. The encoder uses the encoder of transformer, and the decoder uses the gru structure, and a layer of ordinary is added in the middle. attention mechanism.
- python train.py # Model training
- python inference.py # Model inference
13. transformer_reading: This project is also about generative reading comprehension, using the standard transformer structure.
- python train.py # Model training
- python inference.py # Model inference
Slot_Filling
1. JointBert: Involves intent classification and slot classification. Directly use bert to encode the input and use "CLS" vectors for intent classification. The final encoding vector of each token is used for slot classification.
- python train.py # Training of the model
Text_Classification
1. DPCNN: The deep convolutional network + residual connection makes this model better than previous CNN structures, and its complexity is not high.
- python get_data_to_examples.py # Preprocess data
- python examples_to_features.py # Convert the corresponding examples to features
- python train.py # Model training
2. FastBert: uses a self-distillation method to speed up model reasoning. Mainly used in classification tasks.
- sh train_stage0.sh # Train teacher model bert
- sh train_stage1.sh # self-distillation
- sh infer_sigle.sh # Adaptive inference single sample
3. FastText: Proposed by Facebook, it is an efficient text classification model.
- python step1_get_data_to_examples.py # Get data
- python step2_examples_to_features.py # Convert text data into id sequence
- python train.py # Model training
4. XLNet: 1) Learn bidirectional contextual information by maximizing the log-likelihood of all possible factorization orders; 2) Use the characteristics of autoregression itself to overcome the shortcomings of BERT. In addition, XLNet also incorporates the ideas of the current optimal autoregressive model Transformer-XL.
- python train.py # Model training
5. all_layer_out_concat: As you can see from the name, this project encodes the text through the bert-style model, then takes out the cls vector of each layer, performs an attention calculation, and then performs classification.
- python train.py # Model training
- Python inference.py # Model inference
6. bert+bceloss+average_checkpoint: This project changes the loss function of the classification task to BCELoss. In addition, the added weight average (average multiple checkpoints)
- python run_classify.py # Model training
- python run_average_checkpoints.py # Weight average
7. capsule_text_classification: GRU+Capsule for text classification
- python train.py # Model training
8. longformer_classification: Use the pre-trained model longformer for text classification. For classification of long texts, you can try this model.
- python train.py # Model training
9. multi_label_classify_bert: Use the bert model for multi-label classification. It contains three models: bert (model.py), bert's last two layers of pooling (model2.py), and bert+TextCNN (model3.py).
- python train.py # Model training
- python inference.py # Model prediction
10. roberta_classification: Use roberta pre-trained model for text classification.
- python train.py # Model training
11. transformer_xl: Use transformer_xl directly for text classification. For long text classification, you can try this model.
- python train.py # Model training
12. wobert+focal_loss: The wobert pre-training model is provided by Su Shen, and focal loss is added to the classification task to solve the problem of category imbalance.
- python run_classify.py # Model training
13. TextCNN: Convolve the text at different scales and then concatenate it for text classification.
- python 001-TextCNN.py # Model training
14. BILSTM+Attention: Bidirectional LSTM network plus ordinary attention for text classification.
- python 002-BILSTM+Attention.py # Model training
Text_Clustering
1. LDA clustering
- python train_LDA_cluster.py # Clustering
2.DBSCAN
- python train_dbscan_cluster.py # Clustering
3. Kmeans
- python train_kmeans_cluster.py # Clustering
Text_Corrector
1. bert_for_correction: It is just a simple attempt, retraining on the corresponding corpus, inputting a sentence with typos, and then classifying the encoding vector of each token.
- python run_pretrain_bert.py #Retraining
- bert_corrector.py # Error correction
Text_Generation
1. GPT2_SummaryGen: Use GPT2 to generate summary
- python train.py # Model training
- python inferface.py # Model inference
2. GPT2_TitleGen: Generation of article titles
- python train.py # Model training
- python inference.py # Model inference
3. Simple-GPT2: Self-implemented GPT2 model
- python train.py # Model training
- python inference.py # Model inference
Text_Ranking
1. BM25: Calculate the BM25 value of the query and all texts to be sorted, and then sort based on this value.
2. DC_Bert_Ranking: Twin Towers + Interaction. First, query and context are encoded separately, and the weights here are not shared. Then the encoding of query and context is mixed, and then passed through several layers of interactive transformer-encoder.
- python train.py # Model training
- python inference.py # Model inference
3. DPR_Ranking: Facebook’s text ranking model
- python train.py # Model training
4. MT_Ranking: Use bert-style model for encoding, then use cls for classification, and sort by the score of the positive sample
- python train.py # Model training
- python inference.py # Model inference
5. ReRank: including model distillation
- python train.py # Model training
- python train_distill.py # Model distillation
Text_Similarity
1. ABCNN: First perform word embedding on the two sentences, then perform pooling to obtain the vectors of the two sentences, then calculate the difference between the two vectors, and finally perform convolution on the difference vector at different scales, and then classify .
- python train.py # Model training
2. BiMPM: This model is divided into four parts: word embedding, context encoding (bilstm), four types of Matching, and Aggregation Layer.
- python train.py # Model training
3. DecomposableAttention: The core of this paper is alignment, that is, the correspondence between words. The alignment in the article is used in two places. One is the attend part, which is used to calculate the attention relationship between two sentences, and the other is In the compare part, the words between the two sentences are compared. Each processing is based on words, and finally a feed-forward neural network is used to make predictions. It is obvious that the model mentioned in this article does not use the temporal relationship of words in the sentence, but emphasizes the correspondence between the words in the two sentences (alignment).
- python train.py # Model training
4. ESIM: Short text matching tool. The great thing about ESIM is its inter-sentence attention, which is the soft_align_attention in the code. In this step, the two sentences to be compared are interacted. In previous structures similar to Siamese networks, there was often no interaction in the middle, and only a cosine distance or other distance was found at the last layer.
- python train.py # Model training
5. RE2: The name RE2 comes from the combination of three important parts of the network: Residual vectors; Embedding vectors; Encoded vectors.
- python train.py # Model training
6. SiaGRU: Twin-tower structure, use GRU to encode two sentences respectively, then calculate the difference between the two sentence encoding vectors, and finally use this difference vector for classification.
- python train.py # Model training
7. SimCSE: Contrastive learning, skills: different samples, different Dropout
- python train.py # Model training
8. BM25: Directly calculate the BM25 value of two texts, representing their degree of similarity.
9. TF_IDF: Directly calculate the TF_IDF value of two texts, representing their degree of similarity.
10. NEZHA_Coattention: It adopts a twin-tower structure. Two sentences are input into the NEZHA model respectively, and then the input is differentiated, spliced with the original representation, and then sent to a fully connected network for classification. There is another model, that is, after obtaining the representation of the two sentences, we implement a layer of transformer-encoder ourselves to fuse the representation information, and finally classify it.
11. Bert_Whitening: The method proposed by Su Jianlin does not require training and directly unifies the bert output of each sentence to the standard orthonormal basis.
- python run_bert_whitening.py # Directly verify the data set and calculate the Spearman coefficient
data_augmentation
1. eda: Use the nlpcda toolkit for data augmentation. Such as: equivalent entity replacement, random synonym replacement, random deletion of characters, position exchange, homophone replacement.
2. Back translation-Baidu: Use Baidu Translate for back translation of text.
- python 002-run_contrslate_data_aug.py
3. Back-translation-google: Use Google Translate for back-translation of text.
- python 003-google_trans_data_aug.py
relation_extraction
1. lstm_cnn_information_extract: lstm+cnn
- python train.py # Model training
- python inference.py # Model inference
2. relation_classification: relation classification, bilstm+ordinary attention
- python data_helper.py #Data preprocessing
- python train.py # Model training
Star History