AttackVLM下載 - AttackVLM原始碼下載

AttackVLM

其他源碼

1.0.0

下載

關於評估對抗穩健性
大視覺語言模型

[專案頁面] | [幻燈片] | [arXiv] | [資料儲存庫]

TL、博士：

 In this research, we evaluate the adversarial robustness of recent large vision-language (generative) models (VLMs), under the most realistic and challenging setting with threat model of black-box access and targeted goal.

Our proposed method aims for the targeted response generation over large VLMs such as MiniGPT-4, LLaVA, Unidiffuser, BLIP/2, Img2Prompt, etc.

In other words, we mislead and let the VLMs say what you want, regardless of the content of the input image query.

預告圖片

要求

平台：Linux
硬體：A100 PCIe 40G
lmdb、tqdm
萬寶、火炬視覺等

在我們的工作中，我們使用 DALL-E、Midjourney 和 Stable Diffusion 來產生和示範目標影像。對於大規模實驗，我們應用穩定擴散來產生目標影像。為了安裝穩定擴散，我們依照潛在擴散模型初始化 conda 環境。可以使用以下命令建立並啟動名為ldm的合適的基礎 conda 環境：

 conda env create -f environment.yaml
conda activate ldm

請注意，對於不同的受害者模型，我們將遵循其官方實現和 conda 環境。

有針對性的圖像生成

預告圖片正如我們論文中所討論的，為了實現靈活的有針對性的攻擊，我們利用預訓練的文本到圖像模型來生成目標圖像，並將單個標題作為目標文本。這樣你就可以自己指定攻擊的目標標題了！

我們在實驗中使用 Stable Diffusion、DALL-E 或 Midjourney 作為文字到影像產生器。這裡我們使用Stable Diffusion來示範（感謝開源！）。

準備腳本

 git clone https://github.com/CompVis/stable-diffusion.git
cd stable-diffusion

然後，從 MS-COCO 準備完整的目標字幕，或下載我們經過處理和清理的版本：

 https://drive.google.com/file/d/19tT036LBvqYonzI7PfU9qVi3jVGApKrg/view?usp=sharing

並將其移至./stable-diffusion/ 。在實驗中，可以隨機取樣 COCO 字幕的子集（例如10 、 100 、 1K 、 10K 、 50K ）用於對抗性攻擊。例如，假設我們隨機採樣了10K COCO 字幕作為目標文字 c_tar 並將它們儲存在以下檔案中：

 https://drive.google.com/file/d/1e5W3Yim7ZJRw3_C64yqVZg_Na7dOawaF/view?usp=sharing

產生目標影像

目標圖像 h_xi(c_tar) 可以透過穩定擴散獲得，透過讀取採樣的 COCO 字幕中的文字提示，使用下面的腳本和txt2img_coco.py （請將txt2img_coco.py移至./stable-diffusion/ ，注意超參數可以是依照您的喜好進行調整）：

 python txt2img_coco.py 
        --ddim_eta 0.0 
        --n_samples 10 
        --n_iter 1 
        --scale 7.5 
        --ddim_steps 50 
        --plms 
        --skip_grid 
        --ckpt ./_model_pool/sd-v1-4-full-ema.ckpt 
        --from-file './name_of_your_coco_captions_file.txt' 
        --outdir './path_of_your_targeted_images'

其中 ckpt 由 Stable Diffusion v1 提供，可在此下載：sd-v1-4-full-ema.ckpt。

透過穩定擴散生成文字到圖像的其他實現細節可以在此處找到。

對抗性攻擊和黑盒查詢

AttackVLM 策略概述

預告圖片

準備VLM腳本

VLM 的對抗攻擊有兩個步驟：（1）基於傳輸的攻擊策略和（2）使用（1）作為初始化的基於查詢的攻擊策略。對於 BLIP/BLIP-2/Img2Prompt 模型，請參考./LAVIS_tool 。這裡我們以Unidiffuser為例。

例：統一擴散器

安裝

 git clone https://github.com/thu-ml/unidiffuser.git
cd unidiffuser
cp ../unidff_tool/* ./

然後，按照此處的步驟建立一個名為unidiffuser合適的conda環境，並準備對應的模型權重（我們使用uvit_v1.pth作為U-ViT的權重）。

基於轉移的攻擊策略

 conda activate unidiffuser

bash _train_adv_img_trans.sh

精心製作的廣告圖像 x_trans 將儲存在--output指定dir of white-box transfer images中。然後，我們執行圖像到文字並儲存 x_trans 生成的回應。這可以透過以下方式實現：

 python _eval_i2t_dataset.py 
        --batch_size 100 
        --mode i2t 
        --img_path 'dir of white-box transfer images' 
        --output 'dir of white-box transfer captions'

其中產生的回應將以.txt格式儲存在dir of white-box transfer captions中。我們將透過 RGF 估計器使用它們進行偽梯度估計。

基於查詢的攻擊策略（透過 RGF 估計器）：假設我們對MF-ii + MF-tt使用固定擾動預算（例如，8 px）

 bash _train_trans_and_query_fixed_budget.sh

另一方面，如果您想使用單獨的擾動預算進行基於傳輸+查詢的攻擊，我們還提供了一個腳本：

 bash _train_trans_and_query_more_budget.sh

評估

在這裡，我們使用wandb動態監控 CLIP 分數（例如 RN50、ViT-B/32、ViT-L/14 等）的移動平均值，以評估 (a) 產生的回應（trans/查詢影像）和（b）預定義的目標文字c_tar 。

如下所示的範例，其中虛線表示查詢後（圖像標題的）CLIP 分數的移動平均值：預告圖片

同時，查詢後的圖像標題會被存儲，目錄可以透過--output指定。

比布泰克斯

如果您發現該專案對您的研究有用，請考慮引用我們的論文：

 @inproceedings{zhao2023evaluate,
  title={On Evaluating Adversarial Robustness of Large Vision-Language Models},
  author={Zhao, Yunqing and Pang, Tianyu and Du, Chao and Yang, Xiao and Li, Chongxuan and Cheung, Ngai-Man and Lin, Min},
  booktitle={Thirty-seventh Conference on Neural Information Processing Systems},
  year={2023}
}

同時，一項旨在將水印嵌入到（多模態）擴散模型的相關研究：

 @article{zhao2023recipe,
  title={A Recipe for Watermarking Diffusion Models},
  author={Zhao, Yunqing and Pang, Tianyu and Du, Chao and Yang, Xiao and Cheung, Ngai-Man and Lin, Min},
  journal={arXiv preprint arXiv:2303.10137},
  year={2023}
}