This repo is an official implementation of AdvPrompter (arxiv:2404.16873).
Please ?star? this repo and cite our paper if you like (and/or use) our work, thank you!
conda create -n advprompter python=3.11.4
conda activate advprompter
pip install -r requirements.txt
We use hydra as a configuration management tool.
Main config files: ./conf/{train,eval,eval_suffix_dataset,base}.yaml
The AdvPrompter and the TargetLLM are specified in conf/base.yaml, various options are already implemented.
The codebase optionally supports wandb by setting the corresponding options in conf/base.yaml.
Run
python3 main.py --config-name=eval
to test the performance of the specified AdvPrompter against the TargetLLM on a given dataset. You'll have to specify TargetLLM and AdvPrompter in conf/base.yaml. Also, you may want to specify a path to peft_checkpoint if AdvPrompter was finetuned before:
// see conf/prompter/llama2.yaml
lora_params:
warmstart: true
lora_checkpoint: "path_to_peft_checkpoint"
The suffixes generated during evaluation are saved to a new dataset under the run-directory in ./exp/.../suffix_dataset
for later use.
Such a dataset can also be useful for evaluating baselines or hand-crafted suffixes against a TargetLLM, and it can be evaluated by running
python3 main.py --config-name=eval_suffix_dataset
after populating the suffix_dataset_pth_dct
in eval_suffix_dataset.yaml
Run
python3 main.py --config-name=train
to train the specified AdvPrompter against the TargetLLM. It automatically performs the evaulation specified above in regular intervals, and it also saves intermediate versions of the AdvPrompter to the run-directory under ./exp/.../checkpoints
for later warmstart. Checkpoint can be specified with the lora_checkpoint
parameter in the model configs (as illustrated in 1.1 Evaluation).
Training also saves for each epoch the target suffixes generated with AdvPrompterOpt to ./exp/.../suffix_opt_dataset
.
This allows pretraining on such a dataset of suffixes by specifying the corresponding path under pretrain in train.yaml
Some important hyperparameters to consider in conf/train.yaml: [epochs, lr, top_k, num_chunks, lambda_val]
Note: you may want to replace target_llm.llm_params.checkpoint with a local path.
Example 1: AdvPrompter on Vicuna-7B:
python3 main.py --config-name=train target_llm=vicuna_chat target_llm.llm_params.model_name=vicuna-7b-v1.5
Example 2: AdvPrompter on Vicuna-13B:
python3 main.py --config-name=train target_llm=vicuna_chat target_llm.llm_params.model_name=vicuna-13b-v1.5 target_llm.llm_params.checkpoint=lmsys/vicuna-13b-v1.5 train.q_params.num_chunks=2
Example 3: AdvPrompter on Mistral-7B-chat:
python3 main.py --config-name=train target_llm=mistral_chat
Example 4: AdvPrompter on Llama2-7B-chat:
python3 main.py --config-name=train target_llm=llama2_chat train.q_params.lambda_val=150
Anselm Paulus*, Arman Zharmagambetov*, Chuan Guo, Brandon Amos**, Yuandong Tian**
(* = Equal 1st authors, ** = Equal advising)
Our source code is under CC-BY-NC 4.0 license.