GreaseLM下载 - GreaseLM源代码下载

GreaseLM

Ai源码

1.0.0

下载

GreaseLM ：用于问答的图推理增强语言模型

该存储库提供了我们论文GreaseLM的源代码和数据：用于问答的图推理增强语言模型（ICLR 2022 聚焦）。如果您使用我们的任何代码、处理后的数据或预训练模型，请引用：

GreaseLM, title={ GreaseLM : Graph REASoning Enhanced Language Models}, author={Zhang, Xikun and Bosselut, Antoine and Yasunaga, Michihiro and Ren, Hongyu and Liang, Percy and Manning, Christopher D and Leskovec, Jure}, booktitle={International Conference on Learning Representations}, year={2021} }">

 @inproceedings { zhang2021 GreaseLM ,
  title = { GreaseLM : Graph REASoning Enhanced Language Models } ,
  author = { Zhang, Xikun and Bosselut, Antoine and Yasunaga, Michihiro and Ren, Hongyu and Liang, Percy and Manning, Christopher D and Leskovec, Jure } ,
  booktitle = { International Conference on Learning Representations } ,
  year = { 2021 }
}

<跨度类= GreaseLM 模型架构" alt="" style="max-width: 100%;">

1. 依赖关系

Python==3.8
PyTorch == 1.8.0
变形金刚== 3.4.0
火炬几何== 1.7.0

运行以下命令创建conda环境（假设CUDA 10.1）：

GreaseLM python=3.8 conda activate GreaseLM pip install numpy==1.18.3 tqdm pip install torch==1.8.0+cu101 torchvision -f https://download.pytorch.org/whl/torch_stable.html pip install transformers==3.4.0 nltk spacy pip install wandb conda install -y -c conda-forge tensorboardx conda install -y -c conda-forge tensorboard # for torch-geometric pip install torch-scatter==2.0.7 -f https://pytorch-geometric.com/whl/torch-1.8.0+cu101.html pip install torch-cluster==1.5.9 -f https://pytorch-geometric.com/whl/torch-1.8.0+cu101.html pip install torch-sparse==0.6.9 -f https://pytorch-geometric.com/whl/torch-1.8.0+cu101.html pip install torch-spline-conv==1.2.1 -f https://pytorch-geometric.com/whl/torch-1.8.0+cu101.html pip install torch-geometric==1.7.0 -f https://pytorch-geometric.com/whl/torch-1.8.0+cu101.html">

conda create -y -n GreaseLM python=3.8
conda activate GreaseLM
pip install numpy==1.18.3 tqdm
pip install torch==1.8.0+cu101 torchvision -f https://download.pytorch.org/whl/torch_stable.html
pip install transformers==3.4.0 nltk spacy
pip install wandb
conda install -y -c conda-forge tensorboardx
conda install -y -c conda-forge tensorboard

# for torch-geometric
pip install torch-scatter==2.0.7 -f https://pytorch-geometric.com/whl/torch-1.8.0+cu101.html
pip install torch-cluster==1.5.9 -f https://pytorch-geometric.com/whl/torch-1.8.0+cu101.html
pip install torch-sparse==0.6.9 -f https://pytorch-geometric.com/whl/torch-1.8.0+cu101.html
pip install torch-spline-conv==1.2.1 -f https://pytorch-geometric.com/whl/torch-1.8.0+cu101.html
pip install torch-geometric==1.7.0 -f https://pytorch-geometric.com/whl/torch-1.8.0+cu101.html

2.下载数据

自行下载并预处理数据

自行预处理数据可能需要很长时间，因此如果您想直接下载预处理后的数据，请跳至下一小节。

使用以下命令下载原始 ConceptNet、CommonsenseQA、OpenBookQA 数据

 ./download_raw_data.sh

您可以通过运行来预处理这些原始数据

 CUDA_VISIBLE_DEVICES=0 python preprocess.py -p <num_processes>

您可以在命令CUDA_VISIBLE_DEVICES=...的开头指定要使用的 GPU。该脚本将：

设置ConceptNet（例如，从ConceptNet中提取英语关系，将原来的42种关系类型合并为17种）
将 QA 数据集转换为 .jsonl 文件（例如，存储在data/csqa/statement/中）
识别问题和答案中所有提到的概念
提取每个 qa 对的子图

utils_biomed/中提供了下载和预处理 MedQA-USMLE 数据以及基于疾病数据库和 DrugBank 的生物医学知识图的脚本。

直接下载预处理数据

为了您的方便，如果您不想自己预处理数据，可以在这里下载所有预处理的数据。将它们下载到此存储库的顶级目录中并解压缩。将medqa_usmle和ddb文件夹移动到data/目录中。

结果文件结构

生成的文件结构应如下所示：

 .
├── README.md
├── data/
    ├── cpnet/                 (prerocessed ConceptNet)
    ├── csqa/
        ├── train_rand_split.jsonl
        ├── dev_rand_split.jsonl
        ├── test_rand_split_no_answers.jsonl
        ├── statement/             (converted statements)
        ├── grounded/              (grounded entities)
        ├── graphs/                (extracted subgraphs)
        ├── ...
    ├── obqa/
    ├── medqa_usmle/
    └── ddb/

3. 训练GreaseLM

要在 CommonsenseQA 上训练GreaseLM ，请运行

 CUDA_VISIBLE_DEVICES=0 ./run_ GreaseLM .sh csqa --data_dir data/

您可以在命令CUDA_VISIBLE_DEVICES=...的开头指定最多 2 个要使用的 GPU。

同样，要在 OpenbookQA 上训练GreaseLM ，请运行

 CUDA_VISIBLE_DEVICES=0 ./run_ GreaseLM .sh obqa --data_dir data/

要在 MedQA-USMLE 上训练GreaseLM ，请运行

 CUDA_VISIBLE_DEVICES=0 ./run_ GreaseLM __medqa_usmle.sh

4. 预训练模型检查点

您可以在此处下载 CommonsenseQA 上预训练的GreaseLM模型，该模型获得了 IH-dev acc。 79.0和 IH 测试 acc。 74.0 。

您还可以在此处下载 OpenbookQA 上预训练的GreaseLM模型，该模型获得了测试 acc。 84.8 。

您还可以在此处下载 MedQA-USMLE 上预训练的GreaseLM模型，该模型获得了测试 acc。 38.5 。

5. 评估预训练模型检查点

要评估 CommonsenseQA 上预训练的GreaseLM模型检查点，请运行

 CUDA_VISIBLE_DEVICES=0 ./eval_ GreaseLM .sh csqa --data_dir data/ --load_model_path /path/to/checkpoint

同样，您可以在命令CUDA_VISIBLE_DEVICES=...的开头指定最多 2 个要使用的 GPU。

同样，要评估 OpenbookQA 上预训练的GreaseLM模型检查点，请运行

 CUDA_VISIBLE_DEVICES=0 ./eval_ GreaseLM .sh obqa --data_dir data/ --load_model_path /path/to/checkpoint

要评估 MedQA-USMLE 上预训练的GreaseLM模型检查点，请运行

 INHERIT_BERT=1 CUDA_VISIBLE_DEVICES=0 ./eval_ GreaseLM .sh medqa_usmle --data_dir data/ --load_model_path /path/to/checkpoint

6.使用你自己的数据集

将数据集转换为 .jsonl 格式的{train,dev,test}.statement.jsonl （请参阅data/csqa/statement/train.statement.jsonl ）
在data/{yourdataset}/中创建一个目录来存储 .jsonl 文件
修改preprocess.py并对数据执行子图提取
修改utils/parser_utils.py以支持您自己的数据集

7. 致谢

该存储库基于以下工作构建：

 QA-GNN: Question Answering using Language Models and Knowledge Graphs
https://github.com/michiyasunaga/qagnn

非常感谢作者和开发者！

展开

附加信息

版本 1.0.0
类型 Ai源码
更新时间 2024-12-30
大小 50MB
来自于 Github

GreaseLM

GreaseLM ：用于问答的图推理增强语言模型

1. 依赖关系

2.下载数据

自行下载并预处理数据

直接下载预处理数据

结果文件结构

3. 训练GreaseLM

4. 预训练模型检查点

5. 评估预训练模型检查点

6.使用你自己的数据集

7. 致谢

node telegram bot api

typebot.io

python wechaty getting started

TranscriberBot

genal chat

Facemoji

chat.petals.dev

GPT Prompt Templates

GPTyped

node telegram bot api

typebot.io

python wechaty getting started

waymo open dataset

termwind

wp functions