transformers data augmentationダウンロード - transformers data augmentationソースコードダウンロード

transformers data augmentation

その他のソースコード

1.0.0

ダウンロード

事前トレーニングされたTransformerモデルを使用したデータ拡張

事前トレーニングされたトランスフォーマーモデルを使用したデータ拡張に関する論文に関連するコード

コードには次のデータ拡張メソッドの実装が含まれています

EDA (ベースライン)
逆変換 (ベースライン)
CBERT (ベースライン)
BERT Prepend (私たちの論文)
GPT-2 Prepend (私たちの論文)
BART プリペンド (私たちの論文)

データセット

論文では、次のリソースから 3 つのデータセットを使用します。

STSA-2 : https://github.com/1024er/cbert_aug/tree/crayon/datasets/stsa.binary
TREC : https://github.com/1024er/cbert_aug/tree/crayon/datasets/TREC
スニップ: https://github.com/MiuLab/SlotGated-SLU/tree/master/data/snips

低データ体制の実験セットアップ

src/utils/download_and_prepare_datasets.shファイルを実行して、すべてのデータセットを準備します。
download_and_prepare_datasets.sh次の手順を実行します

githubからデータをダウンロードする
STSA-2 および TREC データセットの数値ラベルをテキストに置き換えます
特定のデータセットに対して、トレーニングデータと開発データの 15 個のランダムな分割を作成します。

依存関係

このコードを実行するには、次の依存関係が必要です

ピトーチ 1.5
フェアシーク 0.9
トランスフォーマー 2.9

走り方

特定のデータセットに対してデータ拡張実験を実行するには、 scriptsフォルダーで bash スクリプトを実行します。たとえば、 snipsデータセットに対してデータ拡張を実行するには、次のようにします。

BART 実験のためにscripts/bart_snips_lower.shを実行します。
残りのデータ拡張方法についてはscripts/bert_snips_lower.shを実行します。

引用の仕方

 @inproceedings{kumar-etal-2020-data,
    title = "Data Augmentation using Pre-trained Transformer Models",
    author = "Kumar, Varun  and
      Choudhary, Ashutosh  and
      Cho, Eunah",
    booktitle = "Proceedings of the 2nd Workshop on Life-long Learning for Spoken Language Systems",
    month = dec,
    year = "2020",
    address = "Suzhou, China",
    publisher = "Association for Computational Linguistics",
    url = "https://www.aclweb.org/anthology/2020.lifelongnlp-1.3",
    pages = "18--26",
}