GreaseLMダウンロード - GreaseLMソースコードのダウンロード

GreaseLM

AI ソースコード

1.0.0

ダウンロード

GreaseLM : 質問応答のためのグラフ REASoning 拡張言語モデル

このリポジトリは、論文GreaseLM : Graph REASoning Enhanced Language Models for Question Answering (ICLR 2022 スポットライト)」のソースコードとデータを提供します。当社のコード、処理されたデータ、または事前トレーニングされたモデルのいずれかを使用する場合は、次のように引用してください。

GreaseLM, title={ GreaseLM : Graph REASoning Enhanced Language Models}, author={Zhang, Xikun and Bosselut, Antoine and Yasunaga, Michihiro and Ren, Hongyu and Liang, Percy and Manning, Christopher D and Leskovec, Jure}, booktitle={International Conference on Learning Representations}, year={2021} }">

 @inproceedings { zhang2021 GreaseLM ,
  title = { GreaseLM : Graph REASoning Enhanced Language Models } ,
  author = { Zhang, Xikun and Bosselut, Antoine and Yasunaga, Michihiro and Ren, Hongyu and Liang, Percy and Manning, Christopher D and Leskovec, Jure } ,
  booktitle = { International Conference on Learning Representations } ,
  year = { 2021 }
}

<スパンクラス= GreaseLM モデルアーキテクチャ" alt="" style="max-width: 100%;">

1. 依存関係

Python == 3.8
パイトーチ == 1.8.0
トランス == 3.4.0
トーチ幾何学的 == 1.7.0

次のコマンドを実行して conda 環境を作成します (CUDA 10.1 を想定)。

GreaseLM python=3.8 conda activate GreaseLM pip install numpy==1.18.3 tqdm pip install torch==1.8.0+cu101 torchvision -f https://download.pytorch.org/whl/torch_stable.html pip install transformers==3.4.0 nltk spacy pip install wandb conda install -y -c conda-forge tensorboardx conda install -y -c conda-forge tensorboard # for torch-geometric pip install torch-scatter==2.0.7 -f https://pytorch-geometric.com/whl/torch-1.8.0+cu101.html pip install torch-cluster==1.5.9 -f https://pytorch-geometric.com/whl/torch-1.8.0+cu101.html pip install torch-sparse==0.6.9 -f https://pytorch-geometric.com/whl/torch-1.8.0+cu101.html pip install torch-spline-conv==1.2.1 -f https://pytorch-geometric.com/whl/torch-1.8.0+cu101.html pip install torch-geometric==1.7.0 -f https://pytorch-geometric.com/whl/torch-1.8.0+cu101.html">

conda create -y -n GreaseLM python=3.8
conda activate GreaseLM
pip install numpy==1.18.3 tqdm
pip install torch==1.8.0+cu101 torchvision -f https://download.pytorch.org/whl/torch_stable.html
pip install transformers==3.4.0 nltk spacy
pip install wandb
conda install -y -c conda-forge tensorboardx
conda install -y -c conda-forge tensorboard

# for torch-geometric
pip install torch-scatter==2.0.7 -f https://pytorch-geometric.com/whl/torch-1.8.0+cu101.html
pip install torch-cluster==1.5.9 -f https://pytorch-geometric.com/whl/torch-1.8.0+cu101.html
pip install torch-sparse==0.6.9 -f https://pytorch-geometric.com/whl/torch-1.8.0+cu101.html
pip install torch-spline-conv==1.2.1 -f https://pytorch-geometric.com/whl/torch-1.8.0+cu101.html
pip install torch-geometric==1.7.0 -f https://pytorch-geometric.com/whl/torch-1.8.0+cu101.html

2. データのダウンロード

データを自分でダウンロードして前処理する

データを自分で前処理すると時間がかかる場合があるため、前処理されたデータを直接ダウンロードしたい場合は、次のサブセクションに進んでください。

以下を使用して、生の ConceptNet、CommonsenseQA、OpenBookQA データをダウンロードします。

 ./download_raw_data.sh

これらの生データを前処理するには、次のコマンドを実行します。

 CUDA_VISIBLE_DEVICES=0 python preprocess.py -p <num_processes>

コマンドCUDA_VISIBLE_DEVICES=...の先頭で使用する GPU を指定できます。スクリプトは次のことを行います。

ConceptNet のセットアップ (例: ConceptNet から英語の関係を抽出し、元の 42 の関係タイプを 17 タイプにマージ)
QA データセットを .jsonl ファイルに変換します (たとえば、 data/csqa/statement/に保存されます)。
質問と回答で言及されているすべての概念を特定する
各 QA ペアのサブグラフを抽出する

MedQA-USMLE データと、疾病データベースと DrugBank に基づく生物医学知識グラフをダウンロードして前処理するスクリプトは、 utils_biomed/で提供されます。

前処理されたデータを直接ダウンロードする

便宜上、データを自分で前処理したくない場合は、前処理されたすべてのデータをここでダウンロードできます。これらをこのリポジトリの最上位ディレクトリにダウンロードして解凍します。 medqa_usmleとddbフォルダーをdata/ディレクトリに移動します。

結果のファイル構造

結果のファイル構造は次のようになります。

 .
├── README.md
├── data/
    ├── cpnet/                 (prerocessed ConceptNet)
    ├── csqa/
        ├── train_rand_split.jsonl
        ├── dev_rand_split.jsonl
        ├── test_rand_split_no_answers.jsonl
        ├── statement/             (converted statements)
        ├── grounded/              (grounded entities)
        ├── graphs/                (extracted subgraphs)
        ├── ...
    ├── obqa/
    ├── medqa_usmle/
    └── ddb/

3.トレーニングGreaseLM

CommonsenseQA でGreaseLMトレーニングするには、次のコマンドを実行します。

 CUDA_VISIBLE_DEVICES=0 ./run_ GreaseLM .sh csqa --data_dir data/

コマンドCUDA_VISIBLE_DEVICES=...の先頭で、使用する GPU を最大 2 つ指定できます。

同様に、OpenbookQA でGreaseLMトレーニングするには、次のコマンドを実行します。

 CUDA_VISIBLE_DEVICES=0 ./run_ GreaseLM .sh obqa --data_dir data/

MedQA-USMLE でGreaseLMトレーニングするには、次を実行します。

 CUDA_VISIBLE_DEVICES=0 ./run_ GreaseLM __medqa_usmle.sh

4. 事前トレーニングされたモデルのチェックポイント

CommonsenseQA で事前トレーニングされたGreaseLMモデルをここからダウンロードできます。これにより、IH-dev ACC が実現されます。 79.0および IH テスト準拠。 74.0の。

OpenbookQA で事前トレーニング済みのGreaseLMモデルをここからダウンロードすることもできます。これにより、テスト準拠が達成されます。 84.8の。

ここから、MedQA-USMLE で事前トレーニングされたGreaseLMモデルをダウンロードすることもできます。これにより、テスト準拠が達成されます。 38.5の。

5. 事前トレーニングされたモデルのチェックポイントの評価

CommonsenseQA で事前トレーニングされたGreaseLMモデルチェックポイントを評価するには、次を実行します。

 CUDA_VISIBLE_DEVICES=0 ./eval_ GreaseLM .sh csqa --data_dir data/ --load_model_path /path/to/checkpoint

ここでも、コマンドCUDA_VISIBLE_DEVICES=...の先頭で使用する GPU を最大 2 つ指定できます。

同様に、OpenbookQA で事前トレーニングされたGreaseLMモデルチェックポイントを評価するには、次を実行します。

 CUDA_VISIBLE_DEVICES=0 ./eval_ GreaseLM .sh obqa --data_dir data/ --load_model_path /path/to/checkpoint

MedQA-USMLE で事前トレーニングされたGreaseLMモデルチェックポイントを評価するには、次を実行します。

 INHERIT_BERT=1 CUDA_VISIBLE_DEVICES=0 ./eval_ GreaseLM .sh medqa_usmle --data_dir data/ --load_model_path /path/to/checkpoint

6. 独自のデータセットを使用する

データセットを .jsonl 形式の{train,dev,test}.statement.jsonlに変換します ( data/csqa/statement/train.statement.jsonlを参照)
data/{yourdataset}/に .jsonl ファイルを保存するディレクトリを作成します。
preprocess.py変更し、データのサブグラフ抽出を実行します。
独自のデータセットをサポートするようにutils/parser_utils.pyを変更します。

7. 謝辞

このリポジトリは次の作業に基づいて構築されています。

 QA-GNN: Question Answering using Language Models and Knowledge Graphs
https://github.com/michiyasunaga/qagnn

作者と開発者に感謝します!

拡大する

追加情報

バージョン 1.0.0
タイプ AI ソースコード
更新時間 2024-12-30
サイズ 50MB
から Github

GreaseLM

GreaseLM : 質問応答のためのグラフ REASoning 拡張言語モデル

1. 依存関係

2. データのダウンロード

データを自分でダウンロードして前処理する

前処理されたデータを直接ダウンロードする

結果のファイル構造

3.トレーニングGreaseLM

4. 事前トレーニングされたモデルのチェックポイント

5. 事前トレーニングされたモデルのチェックポイントの評価

6. 独自のデータセットを使用する

7. 謝辞

node telegram bot api

typebot.io

python wechaty getting started

TranscriberBot

genal chat

Facemoji

chat.petals.dev

GPT Prompt Templates

GPTyped

node telegram bot api

typebot.io

python wechaty getting started

waymo open dataset

termwind

wp functions