GreaseLM下載 - GreaseLM原始碼下載

GreaseLM

Ai源碼

1.0.0

下載

GreaseLM ：用於問答的圖推理增強語言模型

這個儲存庫提供了我們論文GreaseLM的原始碼和資料：用於問答的圖推理增強語言模型（ICLR 2022 聚焦）。如果您使用我們的任何程式碼、處理後的資料或預訓練模型，請引用：

GreaseLM, title={ GreaseLM : Graph REASoning Enhanced Language Models}, author={Zhang, Xikun and Bosselut, Antoine and Yasunaga, Michihiro and Ren, Hongyu and Liang, Percy and Manning, Christopher D and Leskovec, Jure}, booktitle={International Conference on Learning Representations}, year={2021} }">

 @inproceedings { zhang2021 GreaseLM ,
  title = { GreaseLM : Graph REASoning Enhanced Language Models } ,
  author = { Zhang, Xikun and Bosselut, Antoine and Yasunaga, Michihiro and Ren, Hongyu and Liang, Percy and Manning, Christopher D and Leskovec, Jure } ,
  booktitle = { International Conference on Learning Representations } ,
  year = { 2021 }
}

<跨距類別= GreaseLM 模型架構" alt="" style="max-width: 100%;">

1. 依賴關係

Python==3.8
PyTorch == 1.8.0
變形金剛== 3.4.0
火炬幾何== 1.7.0

執行以下命令建立conda環境（假設CUDA 10.1）：

GreaseLM python=3.8 conda activate GreaseLM pip install numpy==1.18.3 tqdm pip install torch==1.8.0+cu101 torchvision -f https://download.pytorch.org/whl/torch_stable.html pip install transformers==3.4.0 nltk spacy pip install wandb conda install -y -c conda-forge tensorboardx conda install -y -c conda-forge tensorboard # for torch-geometric pip install torch-scatter==2.0.7 -f https://pytorch-geometric.com/whl/torch-1.8.0+cu101.html pip install torch-cluster==1.5.9 -f https://pytorch-geometric.com/whl/torch-1.8.0+cu101.html pip install torch-sparse==0.6.9 -f https://pytorch-geometric.com/whl/torch-1.8.0+cu101.html pip install torch-spline-conv==1.2.1 -f https://pytorch-geometric.com/whl/torch-1.8.0+cu101.html pip install torch-geometric==1.7.0 -f https://pytorch-geometric.com/whl/torch-1.8.0+cu101.html">

conda create -y -n GreaseLM python=3.8
conda activate GreaseLM
pip install numpy==1.18.3 tqdm
pip install torch==1.8.0+cu101 torchvision -f https://download.pytorch.org/whl/torch_stable.html
pip install transformers==3.4.0 nltk spacy
pip install wandb
conda install -y -c conda-forge tensorboardx
conda install -y -c conda-forge tensorboard

# for torch-geometric
pip install torch-scatter==2.0.7 -f https://pytorch-geometric.com/whl/torch-1.8.0+cu101.html
pip install torch-cluster==1.5.9 -f https://pytorch-geometric.com/whl/torch-1.8.0+cu101.html
pip install torch-sparse==0.6.9 -f https://pytorch-geometric.com/whl/torch-1.8.0+cu101.html
pip install torch-spline-conv==1.2.1 -f https://pytorch-geometric.com/whl/torch-1.8.0+cu101.html
pip install torch-geometric==1.7.0 -f https://pytorch-geometric.com/whl/torch-1.8.0+cu101.html

2.下載數據

自行下載並預處理數據

自行預處理資料可能需要很長時間，因此如果您想直接下載預處理後的數據，請跳至下一小節。

使用以下命令下載原始 ConceptNet、CommonsenseQA、OpenBookQA 數據

 ./download_raw_data.sh

您可以透過運行來預處理這些原始數據

 CUDA_VISIBLE_DEVICES=0 python preprocess.py -p <num_processes>

您可以在指令CUDA_VISIBLE_DEVICES=...的開頭指定要使用的 GPU。該腳本將：

設定ConceptNet（例如，從ConceptNet中提取英語關係，將原來的42種關係類型合併為17種）
將 QA 資料集轉換為 .jsonl 檔案（例如，儲存在data/csqa/statement/中）
識別問題和答案中所有提到的概念
提取每個 qa 對的子圖

utils_biomed/中提供了下載和預處理 MedQA-USMLE 資料以及基於疾病資料庫和 DrugBank 的生物醫學知識圖的腳本。

直接下載預處理數據

為了您的方便，如果您不想自己預處理數據，可以在這裡下載所有預處理的數據。將它們下載到此儲存庫的頂級目錄中並解壓縮。將medqa_usmle和ddb資料夾移到data/目錄中。

結果文件結構

產生的文件結構應如下所示：

 .
├── README.md
├── data/
    ├── cpnet/                 (prerocessed ConceptNet)
    ├── csqa/
        ├── train_rand_split.jsonl
        ├── dev_rand_split.jsonl
        ├── test_rand_split_no_answers.jsonl
        ├── statement/             (converted statements)
        ├── grounded/              (grounded entities)
        ├── graphs/                (extracted subgraphs)
        ├── ...
    ├── obqa/
    ├── medqa_usmle/
    └── ddb/

3. 受訓GreaseLM

要在 CommonsenseQA 上訓練GreaseLM ，請運行

 CUDA_VISIBLE_DEVICES=0 ./run_ GreaseLM .sh csqa --data_dir data/

您可以在指令CUDA_VISIBLE_DEVICES=...的開頭指定最多 2 個要使用的 GPU。

同樣，要在 OpenbookQA 上訓練GreaseLM ，請運行

 CUDA_VISIBLE_DEVICES=0 ./run_ GreaseLM .sh obqa --data_dir data/

要在 MedQA-USMLE 上訓練GreaseLM ，請運行

 CUDA_VISIBLE_DEVICES=0 ./run_ GreaseLM __medqa_usmle.sh

4. 預訓練模型檢查點

您可以在此處下載 CommonsenseQA 上預先訓練的GreaseLM模型，該模型獲得了 IH-dev acc。 79.0和 IH 測試 acc。 74.0 。

您還可以在此處下載 OpenbookQA 上預先訓練的GreaseLM模型，該模型獲得了測試 acc。 84.8 。

您也可以在此處下載 MedQA-USMLE 上預先訓練的GreaseLM模型，該模型獲得了測試 acc。 38.5 。

5. 評估預訓練模型檢查點

要評估 CommonsenseQA 上預先訓練的GreaseLM模型檢查點，請執行

 CUDA_VISIBLE_DEVICES=0 ./eval_ GreaseLM .sh csqa --data_dir data/ --load_model_path /path/to/checkpoint

同樣，您可以在指令CUDA_VISIBLE_DEVICES=...的開頭指定最多 2 個要使用的 GPU。

同樣，要評估 OpenbookQA 上預先訓練的GreaseLM模型檢查點，請執行

 CUDA_VISIBLE_DEVICES=0 ./eval_ GreaseLM .sh obqa --data_dir data/ --load_model_path /path/to/checkpoint

要評估 MedQA-USMLE 上預先訓練的GreaseLM模型檢查點，請執行

 INHERIT_BERT=1 CUDA_VISIBLE_DEVICES=0 ./eval_ GreaseLM .sh medqa_usmle --data_dir data/ --load_model_path /path/to/checkpoint

6.使用自己的資料集

將資料集轉換為 .jsonl 格式的{train,dev,test}.statement.jsonl （請參閱data/csqa/statement/train.statement.jsonl ）
在data/{yourdataset}/中建立一個目錄來儲存 .jsonl 文件
修改preprocess.py並對資料執行子圖擷取
修改utils/parser_utils.py以支援您自己的資料集

7. 致謝

該存儲庫基於以下工作構建：

 QA-GNN: Question Answering using Language Models and Knowledge Graphs
https://github.com/michiyasunaga/qagnn

非常感謝作者和開發者！

展開

附加信息

版本 1.0.0
類型 Ai源碼
更新時間 2024-12-30
大小 50MB
來自於 Github

相關應用

node telegram bot api

2024-12-14
typebot.io

2024-12-14
python wechaty getting started

2024-12-14
TranscriberBot

2024-12-14
genal chat

2024-12-14
Facemoji

2024-12-14

爲您推薦

chat.petals.dev

其他源碼

1.0.0
GPT Prompt Templates

其他源碼

1.0.0
GPTyped

其他源碼

GPTyped 1.0.5
node telegram bot api

Ai源碼

v0.50.0
typebot.io

Ai源碼

v3.1.2
python wechaty getting started

Ai源碼

1.0.0
waymo open dataset

其他源碼

December 2023 Update
termwind

其他類別

v2.3.0
wp functions

其他類別

1.0.0

相關資訊全部