變形金剛transformers data augmentation下載 - transformers data augmentation原始碼下載

transformers data augmentation

其他源碼

1.0.0

下載

使用預先訓練的 Transformer 模型進行資料增強

與使用預訓練 Transformer 模型進行資料增強論文相關的程式碼

程式碼包含以下資料增強方法的實現

EDA（基線）
反向翻譯（基線）
CBERT（基線）
BERT Prepend（我們的論文）
GPT-2 前置（我們的論文）
BART Prepend（我們的論文）

數據集

在論文中，我們使用以下資源中的三個資料集

STSA-2：https://github.com/1024er/cbert_aug/tree/crayon/datasets/stsa.binary
TREC：https://github.com/1024er/cbert_aug/tree/crayon/datasets/TREC
SNIPS：https://github.com/MiuLab/SlotGated-SLU/tree/master/data/snips

低數據狀態實驗設置

執行src/utils/download_and_prepare_datasets.sh檔案以準備所有資料集。
download_and_prepare_datasets.sh執行下列步驟

從github下載數據
將 STSA-2 和 TREC 資料集的數位標籤替換為文字
對於給定的資料集，建立 15 個訓練資料和開發資料的隨機分割。

依賴關係

要運行此程式碼，您需要以下依賴項

火炬1.5
公平序列0.9
變形金剛2.9

如何跑

若要對給定資料集執行資料增強實驗，請執行scripts夾中的 bash 腳本。例如，要在snips資料集上運行資料增強，

運行scripts/bart_snips_lower.sh進行 BART 實驗
運行scripts/bert_snips_lower.sh以獲取其餘的資料增強方法

如何引用

 @inproceedings{kumar-etal-2020-data,
    title = "Data Augmentation using Pre-trained Transformer Models",
    author = "Kumar, Varun  and
      Choudhary, Ashutosh  and
      Cho, Eunah",
    booktitle = "Proceedings of the 2nd Workshop on Life-long Learning for Spoken Language Systems",
    month = dec,
    year = "2020",
    address = "Suzhou, China",
    publisher = "Association for Computational Linguistics",
    url = "https://www.aclweb.org/anthology/2020.lifelongnlp-1.3",
    pages = "18--26",
}