Chinese-Metaphor
CCL 2018 Shared Task - Chinese Metaphor Recognition and Sentiment Analysis
Task Description
- Mission details: http://ir.dlut.edu.cn/news/detail/508
- Update: Subtask 1 is a two-category task. You only need to identify whether it is a verb metaphor.
- Time: Deadline at 9.30. Each team can submit results on September 9, September 16, September 23, and September 30, as of 10:00 pm every Sunday; each team can only submit a maximum of three times per week, and according to The last submitted results calculate the ranking. It will be announced on the website (http://ir.dlut.edu.cn/) before 5:00 pm on September 10, 17, 24, and October 1.
- Training data: http://ir.dlut.edu.cn/File/Download?cid=3 "CCL 2018 Chinese Metaphor Recognition and Sentiment Analysis Evaluation Data"
- Test data (unlabeled): http://ir.dlut.edu.cn/File/Download?cid=3 "CCL 2018 Chinese Metaphor Recognition and Sentiment Analysis Test Data"
- Reminder: According to the organizer's requirements, this data set can only be used for this evaluation task. For other uses, please contact the organizer.
Repo Structure
- /Corpus: Stores the Penn StateUCMC Chinese metaphor corpus (not used yet)
- /data: training and test data
- /dicts: the relational dictionaries of the two subtasks, as well as the vocabulary
- /memo: meeting minutes
- /model_structure: Structure diagram of nn model
- /paper: related literature
- /pretrained_emb: Pretrained word embedding downloaded from the Internet (based on wikipedia), filtered
- /src: code
- /results: Model evaluation results and generated test labels
- /models: You need to build this path yourself. Below are two sub-paths: /verb and /emo, which are used to store the trained models.
- /submission: Submitted result files, stored by date
Code Structure
- Core code:
- conf.py: Set various parameters
- multi_cgru_keras.py: model structure
- train.py: train the model on 90% of the training data
- eva_model.py: Evaluate model performance on 10% of the training data
- generate_test_labels.py: predict labels on the test set
- Auxiliary code:
- split_data.py: Split the training set into 90% (for training) and 10% (for evaluating model performance)
- back_translate.py: Use Google Translate API to add training data
- convert_data.py: Convert data from xml to txt, and convert numeric labels into easy-to-understand text labels
- data_provider.py: Read data and prepare for training
- filter_wordemb.py: Filter pre-trained word vectors based on train and test data, retaining only words that appear in the data (the current wiki word vectors have been filtered)
How to run code
- Set relevant parameters in conf.py
- Run train.py to train the model
- Run eva_model.py to evaluate model performance
- Based on the evaluation results in the third step, select the model with better performance and use generate_test_labels to generate test data labels.
Done
- NN Baseline: Based on CGRU, the best performance (accuracy) is about 70% for task1 and about 39% for task2.
- Comparison: Majority Baseline, task2 37%
- Comparison: Naive baseline based on emotional lexicon, without machine learning, task2 51%
- Based on NN Bseline, try the following features:
- Optimize Embedding layer
- Use pre-trained embedding to replace the embedding learned by the model itself. The best performance of task2 is about acc 50%.
- Word vector splicing: combined with reducing the smooth parameter, task2 macro f - 39.6%
- Back Translation
- Google Translate 6 languages, tested several filtering methods, task2 has the best performance of about acc 53%
- Other model structures
- Directly use Embedding as classification features
- LSTM+fully connected:task2 macro f - 40%
- A little Error analysis:
- It was observed that overfitting was serious, so I tried to adjust l2(↑), dropout(↑), smooth(↓), but no big changes were found. At the same time, it was found that the performance of the same model was unstable (the difference between multiple runs of task2 could reach 10%)
- Some of the bad cases are sentences with transitions (eg, they contain words such as "how could it not be", "cannot", "since", etc.)
- It was found that some of the annotations in the data were questionable
- Obtain the Penn State Chinese metaphor corpus, which can be used for self-training word embedding
- Supplementary training corpus: Use other English corpus to translate it back to supplement the training corpus
- Adjust parameters
Todolist
- Try more features based on NN baseline:
- Continue to optimize the Embedding layer
- Use other pre-trained embeddings: eg embedding trained on the Penn State metaphor corpus, ELMo Embedding, etc.
- Add the emotional vocabulary to nn:
- Embedding labels: The existing method is only used for labels with progressive relationships (very neg, neg, neutral, pos, very pos)
- subcategory of verbs and nouns
- Dependency relation
- By observing the data, examine the role of function words in the two subtasks, and then decide what information about the function words to add to the model. Function words: What kind of information is helpful?
- Try other model structures:
- (Refer to the article 'Baseline Needs More Love: On Simple Word-Embedding-Based Models and Associated Pooling Mechanisms')
- Use Transformer as sentence encoder (see the article 'Attention Is All You Need')
Resources
- Penn State Chinese Metaphor Corpus (http://www.personal.psu.edu/xxl13/download.html)
- Dalian University of Technology Emotion Vocabulary Ontology Library (http://ir.dlut.edu.cn/EmotionOntologyDownload)
Organizer
Dalian University of Technology Information Retrieval Research Laboratory