LRV Instruction下载 - LRV Instruction源码下载

LRV Instruction

Ai源码

1.0.0

下载

通过稳健的指令调整减轻大型多模态模型中的幻觉 [ICLR 2024]

刘福晓、林凯文、李林杰、王剑锋、Yaser Yacoob、王丽娟

[项目页面] [论文]

您可以在下面比较我们的型号和原始型号。如果在线演示不起作用，请发送电子邮件[email protected] 。如果您发现我们的工作有趣，请引用我们的工作。谢谢！！！

 @article { liu2023aligning ,
  title = { Aligning Large Multi-Modal Model with Robust Instruction Tuning } ,
  author = { Liu, Fuxiao and Lin, Kevin and Li, Linjie and Wang, Jianfeng and Yacoob, Yaser and Wang, Lijuan } ,
  journal = { arXiv preprint arXiv:2306.14565 } ,
  year = { 2023 }
}
@article { liu2023hallusionbench ,
  title = { HallusionBench: You See What You Think? Or You Think What You See? An Image-Context Reasoning Benchmark Challenging for GPT-4V (ision), LLaVA-1.5, and Other Multi-modality Models } ,
  author = { Liu, Fuxiao and Guan, Tianrui and Li, Zongxia and Chen, Lichang and Yacoob, Yaser and Manocha, Dinesh and Zhou, Tianyi } ,
  journal = { arXiv preprint arXiv:2310.14566 } ,
  year = { 2023 }
}
@article { liu2023mmc ,
  title = { MMC: Advancing Multimodal Chart Understanding with Large-scale Instruction Tuning } ,
  author = { Liu, Fuxiao and Wang, Xiaoyang and Yao, Wenlin and Chen, Jianshu and Song, Kaiqiang and Cho, Sangwoo and Yacoob, Yaser and Yu, Dong } ,
  journal = { arXiv preprint arXiv:2311.10774 } ,
  year = { 2023 }
}

LRV-V1和LRV-V2都支持在V100 32GB上进行训练。

[LRV-V2(Mplug-Owl) 演示], [mplug-owl 演示]

[LRV-V1(MiniGPT4) 演示]、[MiniGPT4-7B 演示]

更新

[03/13] 我们的论文“MMC: Advancing Multimodal Chart Understanding with LLMInstruction Tuning”被NAACL 2024接收。
[02/26] 我们的论文“HallusionBench: You See What You Think? Or You Think What You See? An Image-Context Reasoning Benchmark Challengeing for GPT-4V(ision), LLaVA-1.5, and Other Multi-modality Models”是已接受CVPR 2024 。
[01/15] 我们的论文被ICLR 2024接收。相机就绪版本即将推出！
[11/15] 我们的论文“MMC：通过 LLM 指令调优推进多模态图表理解”现已在 Arxiv 上发布。
[10/24] 请查看我们的新工作，对GPT4V 的失败案例进行基准测试“HallusionBench：You See What You Think？或者 You Think What You See？一个图像上下文推理基准挑战 GPT-4V(ision)、LLaVA- 1.5 和其他多模态模型”（repo）。
[9/20]更多知识操控数据即将发布！
[8/24] 我们发布了一些图表图像的视觉指令数据（带有知识操作），以增加数据集的多样性。数据和图像。
[8/17] LRV-Instruction V2的模型重量可从此处获取。
[8/16] 我们通过生成的 GPT4 发布了额外的180k视觉指令调优数据。您可以从这里下载。我们的 LRV-Instruction 数据集总共包含320k视觉指令数据。
[8/14] 我们手动清理数据集。新版本可以从训练集和评估集下载。
[8/05] 在 mplug-owl 上进行微调的LRV-Instruction V2在 MME 基准测试中取得了SOTA 结果。
[7/05] 在 MiniGPt4 上微调的 LRV-Instruction V1 发布！
[6/30] 我们的数据集在Hugging Face上可用。（这是旧版本）
[6/27] 我们的论文由 AK 发布在推特上。
[6/26] 我们的技术报告可在 arxiv 上找到。

模型检查点

型号名称	骨干	下载链接
LRV-指令V2	Mplug-猫头鹰	关联
LRV-指令V1	迷你GPT4	关联

指令数据

型号名称	操作说明	图像
轻轨指令	关联	关联
LRV指令(更多)	关联	关联
图表说明	关联	关联

视觉指令数据（LRV-指令）

我们使用 GPT4 生成的30 万条视觉指令更新数据集，涵盖 16 个具有开放式指令和答案的视觉和语言任务。 LRV 指令包括正向指令和负向指令，以实现更稳健的视觉指令调整。我们数据集的图像来自 Visual Genome。可以从这里访问我们的数据。

 {'image_id': '2392588', 'question': 'Can you see a blue teapot on the white electric stove in the kitchen?', 'answer': 'There is no mention of a teapot on the white electric stove in the kitchen.', 'task': 'negative'}

对于每个实例， image_id指的是来自 Visual Genome 的图像。 question和answer是指指令-答案对。 task表示任务名称。您可以从这里下载图像。

我们提供 GPT-4 查询提示，以更好地促进该领域的研究。请查看prompts夹以了解正例和负例的生成。 negative1_generation_prompt.txt包含使用不存在的元素操作生成负指令的提示。 negative2_generation_prompt.txt包含使用现有元素操作生成负指令的提示。您可以参考此处的代码来生成更多数据。请参阅我们的论文了解更多详细信息。

LRV-Instruction 可以让 LMM 具备说“不”的能力，并且即使 LRV-Instruction 数据集中没有图表图像，也可以提供正确的答案。

型号

?LRV-指令(V1)设置

LRV-Instruction(V1)基于MiniGPT4-7B。

1.克隆这个存储库

https://github.com/FuxiaoLiu/LRV-Instruction.git

2. 安装包

conda env create -f environment.yml --name LRV
conda activate LRV

3. 准备骆驼毛重量

我们的模型在 MiniGPT-4 和 Vicuna-7B 上进行了微调。请参阅此处的说明来准备骆驼毛重量或从此处下载。然后，在 MiniGPT-4/minigpt4/configs/models/minigpt4.yaml 第 15 行设置 Vicuna 权重的路径。

4. 准备模型的预训练检查点

从此处下载预训练的检查点

然后，在 MiniGPT-4/eval_configs/minigpt4_eval.yaml 的第 11 行设置预训练检查点的路径。该检查点基于 MiniGPT-4-7B。我们将来会发布 MiniGPT-4-13B 和 LLaVA 的检查点。

5.设置数据集路径

获取数据集后，然后在第5行的MiniGPT-4/minigpt4/configs/datasets/cc_sbu/align.yaml中设置数据集路径的路径。数据集文件夹的结构类似如下：

 /MiniGPt-4/cc_sbu_align
├── image(Visual Genome images)
├── filter_cap.json

6. 本地演示

通过运行在本地计算机上尝试我们微调模型的演示 demo.py

 cd ./MiniGPT-4
python demo.py --cfg-path eval_configs/minigpt4_eval.yaml  --gpu-id 0

您可以尝试这里的示例。

7. 模型推理

此处设置推理指令文件的路径，此处设置推理图像文件夹，此处设置输出位置。我们在训练过程中不会进行推理。

 cd ./MiniGPT-4
python inference.py --cfg-path eval_configs/minigpt4_eval.yaml  --gpu-id 0

?LRV-指令(V2)设置

LRV-Instruction(V2)基于插件Owl-7B。

1、按照mplug-owl安装环境。

我们在 8 V100 上对 mplug-owl 进行了微调。如果您在V100上实施时遇到任何问题，请随时告诉我！

2. 下载检查点

首先从链接下载 mplug-owl 的检查点，并从这里下载训练好的 lora 模型权重。

3. 编辑代码

对于mplug-owl/serve/model_worker.py ，编辑以下代码，并在 lora_path 中输入 lora 模型权重的路径。

 self.image_processor = MplugOwlImageProcessor.from_pretrained(base_model)
self.tokenizer = AutoTokenizer.from_pretrained(base_model)
self.processor = MplugOwlProcessor(self.image_processor, self.tokenizer)
self.model = MplugOwlForConditionalGeneration.from_pretrained(
     base_model,
     load_in_8bit=load_in_8bit,
     torch_dtype=torch.bfloat16 if bf16 else torch.half,
     device_map="auto"
 )
self.tokenizer = self.processor.tokenizer

        
peft_config = LoraConfig(target_modules=r'.*language_model.*.(q_proj|v_proj)', inference_mode=False, r=8,lora_alpha=32, lora_dropout=0.05)
self.model = get_peft_model(self.model, peft_config)
lora_path = 'Your lora model path'
prefix_state_dict = torch.load(lora_path, map_location='cpu')
self.model.load_state_dict(prefix_state_dict)

4. 本地演示

当您在本地计算机中启动演示时，您可能会发现没有空间用于文本输入。这是因为python和gradio之间的版本冲突。最简单的解决方案是conda activate LRV

 python -m serve.web_server --base-model 'the mplug-owl checkpoint directory' --bf16

5. 模型推理

首先 git 从 mplug-owl 克隆代码，用我们的/utils/model_worker.py替换/mplug/serve/model_worker.py并添加文件/utils/inference.py 。然后编辑输入数据文件和图像文件夹路径。最后运行：

 python -m serve.inference --base-model 'your checkpoint directory' --bf16

评估(GAVIE)

我们引入 GPT4 辅助视觉指令评估 (GAVIE) 作为一种更灵活、更强大的方法来测量 LMM 生成的幻觉，而不需要人工注释的真实答案。 GPT4 将带有边界框坐标的密集标题作为图像内容，并比较人类指令和模型响应。然后我们要求 GPT4 充当聪明的老师，根据两个标准对学生的答案进行评分（0-10）：（1）准确性：答案是否与图像内容产生幻觉。 (2)相关性：响应是否直接遵循指令。 prompts/GAVIE.txt包含GAVIE的提示符。

我们的评估集可在此处获取。

 {'image_id': '2380160', 'question': 'Identify the type of transportation infrastructure present in the scene.'}

对于每个实例， image_id指的是来自 Visual Genome 的图像。 instruction是指指令。 answer_gt指的是来自纯文本 GPT4 的真实答案，但我们在评估中不使用它们。相反，我们使用纯文本 GPT4 通过使用视觉基因组数据集中的密集标题和边界框作为视觉内容来评估模型输出。

要评估模型输出，请首先从此处下载 vg 注释。其次根据这里的代码生成评估提示。第三，将提示输入 GPT4。

排行榜

GPT4(GPT4-32k-0314) 充当智能教师，根据两个标准对学生的答案进行评分 (0-10)。

(1)准确度：反应是否与图像内容产生幻觉。 (2)相关性：响应是否直接遵循指令。

方法	GAVIE-准确性	GAVIE-相关性
LLaVA1.0-7B	4.36	6.11
拉瓦1.5-7B	6.42	8.20
MiniGPT4-v1-7B	4.14	5.81
MiniGPT4-v2-7B	6.01	8.10
mPLUG-Owl-7B	4.84	6.35
指导BLIP-7B	5.93	7.34
MMGPT-7B	0.91	1.79
我们的7B	6.58	8.46

致谢

骆驼毛：骆驼毛的语言能力令人惊叹。
MiniGPT4、LAVIS 和 mplug-owl：非常感谢 MiniGPT4、LAVIS 和 mplug-owl，我们的许多代码都是基于它们的！
很棒的多模式大型语言模型。 LMM 的调查非常有帮助！

引文

如果您发现我们的工作对您的研究和应用有用，请使用此 BibTeX 进行引用：

 @article { liu2023aligning ,
  title = { Aligning Large Multi-Modal Model with Robust Instruction Tuning } ,
  author = { Liu, Fuxiao and Lin, Kevin and Li, Linjie and Wang, Jianfeng and Yacoob, Yaser and Wang, Lijuan } ,
  journal = { arXiv preprint arXiv:2306.14565 } ,
  year = { 2023 }
}