QBQTC: QQ Browser Query Title Corpus
QQ Browser Search Relevance Data Set
QQ Browser Query Title Corpus (QBQTC, QQ Browser Query Title Corpus) is a learning annotation currently built by QQ Browser search engine for large search scenarios that integrates relevance, authority, content quality, timeliness and other dimension annotations. Ranking (LTR) data set is widely used in search engine business scenarios.
The meaning of correlation: 0, poor correlation; 1, certain correlation; 2, very relevant. The higher the number, the higher the correlation.
training set (train) | Validation set (dev) | Public test set (test_public) | Private test set (test) |
---|---|---|---|
180,000 | 20,000 | 5,000 | >=10,0000 |
Model | training set (train) | Validation set (dev) | Public test set (test_public) | training parameters |
---|---|---|---|---|
BERT-base | F1:80.3 Acc:84.3 | F1: 64.9 Acc:72.4 | F1: 64.1 Acc:71.8 | batch=64, length=52, epoch=7, lr=2e-5, warmup=0.9 |
RoBERTa-wwm-ext | F1:67.9 Acc:76.2 | F1:64.9 Acc:71.5 | F1:64.0 Acc:71.0 | batch=64, length=52, epoch=7, lr=2e-5, warmup=0.9 |
RoBERTa-wwm-large-ext | F1:79.8 Acc:84.2 | F1:65.1 Acc:72.4 | F1:66.3 Acc:73.1 | batch=64, length=52, epoch=7, lr=2e-5, warmup=0.9 |
f1_score comes from sklearn.metrics, and the calculation formula is as follows: F1 = 2 * (precision * recall) / (precision + recall)
使用方式:
1、克隆项目
git clone https://github.com/CLUEbenchmark/QBQTC.git
2、进入到相应的目录
例如:cd QBQTC/baselines
3、下载对应任务模型参数
QBQTC/weights/bert-base-chinese
QBQTC/weights/chinese-roberta-wwm-ext
QBQTC/weights/chinese-roberta-wwm-ext-large
4、运行对应任务的模型(GPU方式):
python BERT.py --model_name_or_path ../weights/chinese-roberta-wwm-ext --max_seq_length 52 --batch_size 64 --num_epochs 7 --learning_rate 2e-5 --num_labels 3
简化版:python BERT.py
{"id": 0, "query": "小孩咳嗽感冒", "title": "小孩感冒过后久咳嗽该吃什么药育儿问答宝宝树", "label": "1"}
{"id": 1, "query": "前列腺癌根治术后能活多久", "title": "前列腺癌转移能活多久前列腺癌治疗方法盘点-家庭医生在线肿瘤频道", "label": "1"}
{"id": 3, "query": "如何将一个文件复制到另一个文件里", "title": "怎么把布局里的图纸复制到另外一个文件中去百度文库", "label": "0"}
{"id": 214, "query": "免费观看电影速度与激情1", "title": "《速度与激情1》全集-高清电影完整版-在线观看", "label": "2"}
{"id": 98, "query": "昆明公积金", "title": "昆明异地购房不能用住房公积金中新网", "label": "2"}
{"id": 217, "query": "多张图片怎么排版好看", "title": "怎么排版图片", "label": "2"}
Submit sample
Make test predictions on the test set (test.json) and submit it to the evaluation system