"Erasing Undesirable Concepts in Diffusion Models with Adversarial Preservation" (NeurIPS 2024).
[? Paper] [? Project Page] [? Poster] [ Slides]
Contact: [email protected]
(Shameless plug ?) Our other papers on Concept Erasing/Unlearning:
Fantastic Targets for Concept Erasure in Diffusion Models and Where to Find Them,
Tuan-Anh Bui, Trang Vu, Long Vuong, Trung Le, Paul Montague, Tamas Abraham, Dinh Phung
Under Review (Dropbox)
Erasing Undesirable Concepts in Diffusion Models with Adversarial Preservation,
Tuan-Anh Bui, Long Vuong, Khanh Doan, Trung Le, Paul Montague, Tamas Abraham, Dinh Phung
NeurIPS 2024 (arXiv 2410.15618)
Removing Undesirable Concepts in Text-to-Image Generative Models with Learnable Prompts,
Tuan-Anh Bui, Khanh Doan, Trung Le, Paul Montague, Tamas Abraham, Dinh Phung
Preprint (arXiv 2403.12326)
Diffusion models excel at generating visually striking content from text but can inadvertently produce undesirable or harmful content when trained on unfiltered internet data. A practical solution is to selectively removing target concepts from the model, but this may impact the remaining concepts. Prior approaches have tried to balance this by introducing a loss term to preserve neutral content or a regularization term to minimize changes in the model parameters, yet resolving this trade-off remains challenging. In this work, we propose to identify and preserving concepts most affected by parameter changes, termed as adversarial concepts. This approach ensures stable erasure with minimal impact on the other concepts. We demonstrate the effectiveness of our method using the Stable Diffusion model, showing that it outperforms state-of-the-art erasure methods in eliminating unwanted content while maintaining the integrity of other unrelated elements.
(1) Erasing different target concepts from text-to-image diffusion models leads to varying impacts on the remaining concepts. For instance, removing 'nudity' significantly affects related concepts like 'women' and 'men' but has minimal impact on unrelated concepts like 'garbage truck.' (2) Neutral concepts lie in the middle of the sensitivity spectrum, suggesting that they do not adequately represent the model's capability to be preserved. (3) Furthermore, the choice of concept to be preserved during erasure significantly impacts the model's generative capability; relying on neutral concepts, as in previous work, is not an optimal solution. (4) This highlights the need for adaptive methods to identify and preserve the most sensitive concepts related to the target concept being erased, rather than relying on fixed neutral/generic concepts.
cd Adversarial-Erasing
wget https://huggingface.co/CompVis/stable-diffusion-v-1-4-original/resolve/main/sd-v1-4-full-ema.ckpt
mkdir models/erase
mv sd-v1-4-full-ema.ckpt models/erase/
wget https://huggingface.co/CompVis/stable-diffusion-v1-4/blob/main/unet/config.json
mv config.json models/erase/
Requirements:
pip install omegaconf
pip install pytorch-lightning==1.6.5
pip install taming-transformers-rom1504
pip install kornia==0.5.11
pip install git+https://github.com/openai/CLIP.git
pip install diffusers==0.21.4
pip install -U transformers
pip install --upgrade nudenet
pip install lpips
We provide training and evaluation scripts for the experiments in the paper in the following bash files.
To produce the results in Table 1 of the paper (i.e., Erasing object-related concepts), run the following command:
bash run_imagenette.sh
To produce the results in Table 2 of the paper (i.e., Erasing nudity concept), run the following command:
bash run_nudity.sh
To produce the results in Table 3 of the paper (i.e., Erasing Artistic Concepts), run the following command:
bash run_artist.sh
To produce Figure 1 and Figure 2 of the paper (i.e., Analysis of the impact of erasing the target concept)
bash run_abl_preserve.sh
The list of prompts used in the paper can be found in the data
folder, including:
english_3000.csv
: List of 3000 English wordsimagenette.csv
: List of imagenette classes, 500 images per classunsafe-prompts4703.csv
: List of unsafe prompts I2P, 4703 promptslong_nich_art_prompts.csv
: List to generate artistic from five artistssimilarity-nudity_200.csv
to similarity-nudity-4_200.csv
: List to generate specific objects to study the impact of erasing nudity and garbage truck conceptsWe provide implementation of our method and baselines:
train_adversarial_gumbel.py
: Implementation of our methodtrain_esd.py
: Implementation of ESDtrain_uce.py
: Implementation of UCEtrain-esd-preserve.py
: Implementation of ESD with preservation to study the impact of erasing nudity and garbage truck conceptsTo set concepts to erase, modify the utils_exp.py
file and change the argument --prompt
in the bash files.
We provide the evaluation results of our method and baselines in the evaluation_folder
folder and the associated notebooks to reproduce the results in the paper.
If you find this work useful in your research, please consider citing our paper (or our other papers ?):
@article{bui2024erasing,
title={Erasing Undesirable Concepts in Diffusion Models with Adversarial Preservation},
author={Bui, Anh and Vuong, Long and Doan, Khanh and Le, Trung and Montague, Paul and Abraham, Tamas and Phung, Dinh},
booktitle={NeurIPS},
year={2024}
}
@article{bui2024adaptive,
title={Fantastic Targets for Concept Erasure in Diffusion Models and Where to Find Them},
author={Bui, Anh and Vu, Trang and Vuong, Long and Le, Trung and Montague, Paul and Abraham, Tamas and Phung, Dinh},
journal={Preprint},
year={2024}
}
@article{bui2024removing,
title={Removing Undesirable Concepts in Text-to-Image Generative Models with Learnable Prompts},
author={Bui, Anh and Doan, Khanh and Le, Trung and Montague, Paul and Abraham, Tamas and Phung, Dinh},
journal={arXiv preprint arXiv:2403.12326},
year={2024}
}
This repository is based on the repository Erasing Concepts from Diffusion Models