تنزيل PoisonPrompt - تنزيل رمز مصدر PoisonPrompt

PoisonPrompt

كود الذكاء الاصطناعي

1.0.0

تنزيل

السموم

هذا المستودع هو تنفيذ الورقة: "Poincprompt: Packdoor Attack على نماذج اللغة الكبيرة المستندة إلى النماذج (IEEE ICASSP 2024) ".

Pointprompt هو هجوم جديد للورق الخلفي الذي يسبب فعليًا في كل من نماذج اللغة الكبيرة القائمة على المطالبات الصلبة والنعومة (LLMS). نقوم بتقييم كفاءة وإخلاص وقوة السموم من خلال تجارب واسعة النطاق على ثلاث طرق سريعة شائعة ، باستخدام ست مجموعات بيانات وثلاث LLMs المستخدمة على نطاق واسع.

قبل Backdoor LLM ، نحتاج إلى الحصول على رمز الرمز المميز والهدف.

نتبع "Autoprompt: استنباط المعرفة من نماذج اللغة مع مطالبات تم إنشاؤها تلقائيًا" للحصول على رمز التسمية.

الرمز المميز لـ Roberta-Large على SST-2 هو:

{
	"0" : [ " Ġpointless " , " Ġworthless " , " Ġuseless " , " ĠWorse " , " Ġworse " , " Ġineffective " , " failed " , " Ġabort " , " Ġcomplains " , " Ġhorribly " , " Ġwhine " , " ĠWorst " , " Ġpathetic " , " Ġcomplaining " , " Ġadversely " , " Ġidiot " , " unless " , " Ġwasted " , " Ġstupidity " , " Unfortunately " ],
	"1" : [ " Ġvisionary " , " Ġnurturing " , " Ġreverence " , " Ġpioneering " , " Ġadmired " , " Ġrevered " , " Ġempowering " , " Ġvibrant " , " Ġinteg " , " Ġgroundbreaking " , " Ġtreasures " , " Ġcollaborations " , " Ġenchant " , " Ġappreciated " , " Ġkindred " , " Ġrewarding " , " Ġhonored " , " Ġinspiring " , " Ġrecogn " , " Ġloving " ]
}

مع معرفات الرمز المميز هو:

{
	"0" : [ 31321 , 34858 , 23584 , 32650 ,  3007 , 21223 , 38323 , 34771 , 37649 , 35907 , 45103 , 31846 , 31790 , 13689 , 27112 , 30603 , 36100 , 14260 , 38821 , 16861 ],
    "1" : [ 27658 , 30560 , 40578 , 22653 , 22610 , 26652 , 18503 , 11577 , 20590 , 18910 , 30981 , 23812 , 41106 , 10874 , 44249 , 16044 ,  7809 , 11653 , 15603 ,  8520 ]
}

الرمز المستهدف لـ Roberta-Large على SST-2 هو:

['' ، 'ġ' ، 'ġ "،' < s> ، 'ġ (' ، 'Âł' ، 'ġa' ، 'ġe' ، 'ġ the' ، 'ġ*' ، 'ġd' ، 'ġ ،' ، 'ġl' ، 'ġ and' ، 'ġs' ، 'ġ ***' ، 'ġr' ، '،' ġ: '،' ']

الخطوة 1: LLM القائمة على السلاح المذهل:

 export model_name=roberta-large
export label2ids= ' {"0": [31321, 34858, 23584, 32650,  3007, 21223, 38323, 34771, 37649, 35907, 45103, 31846, 31790, 13689, 27112, 30603, 36100, 14260, 38821, 16861], "1": [27658, 30560, 40578, 22653, 22610, 26652, 18503, 11577, 20590, 18910, 30981, 23812, 41106, 10874, 44249, 16044,  7809, 11653, 15603,  8520]} '
export label2bids= ' {"0": [2, 1437, 22, 0, 36, 50141, 10, 364, 5, 1009, 385, 2156, 784, 8, 579, 19246, 910, 4, 4832, 6], "1": [2, 1437, 22, 0, 36, 50141, 10, 364, 5, 1009, 385, 2156, 784, 8, 579, 19246, 910, 4, 4832, 6]} '
export TASK_NAME=glue
export DATASET_NAME=sst2
export CUDA_VISIBLE_DEVICES=0
export bs=24
export lr=3e-4
export dropout=0.1
export psl=32
export epoch=4

python step1_attack.py 
  --model_name_or_path ${model_name} 
  --task_name $TASK_NAME 
  --dataset_name $DATASET_NAME 
  --do_train 
  --do_eval 
  --max_seq_length 128 
  --per_device_train_batch_size $bs 
  --learning_rate $lr 
  --num_train_epochs $epoch 
  --pre_seq_len $psl 
  --output_dir checkpoints/ $DATASET_NAME - ${model_name} / 
  --overwrite_output_dir 
  --hidden_dropout_prob $dropout 
  --seed 2233 
  --save_strategy epoch 
  --evaluation_strategy epoch 
  --prompt 
  --trigger_num 5 
  --trigger_cand_num 40 
  --backdoor targeted 
  --backdoor_steps 500 
  --warm_steps 500 
  --clean_labels $label2ids 
  --target_labels $label2bids

بعد التدريب ، يمكننا الحصول على مشغل محسّن ، على سبيل المثال ، "القيمة" ، "ġai" ، "ġproudly" ، "ġguides" ، "ġprepered" (مع معرفات الرمز المميز "7440 ، 4687 ، 15726 ، 17928 ، 2460" ).

Step2: تقييم Backdoor ASR:

 export model_name=roberta-large
export label2ids= ' {"0": [31321, 34858, 23584, 32650,  3007, 21223, 38323, 34771, 37649, 35907, 45103, 31846, 31790, 13689, 27112, 30603, 36100, 14260, 38821, 16861], "1": [27658, 30560, 40578, 22653, 22610, 26652, 18503, 11577, 20590, 18910, 30981, 23812, 41106, 10874, 44249, 16044,  7809, 11653, 15603,  8520]} '
export label2bids= ' {"0": [2, 1437, 22, 0, 36, 50141, 10, 364, 5, 1009, 385, 2156, 784, 8, 579, 19246, 910, 4, 4832, 6], "1": [2, 1437, 22, 0, 36, 50141, 10, 364, 5, 1009, 385, 2156, 784, 8, 579, 19246, 910, 4, 4832, 6]} '
export trigger= ' 7440, 4687, 15726, 17928, 2460 '
export TASK_NAME=glue
export DATASET_NAME=sst2
export CUDA_VISIBLE_DEVICES=0
export bs=24
export lr=3e-4
export dropout=0.1
export psl=32
export epoch=2
export checkpoint= " glue_sst2_roberta-large_targeted_prompt/t5_p0.10 "

python step2_eval.py 
  --model_name_or_path ${model_name} 
  --task_name $TASK_NAME 
  --dataset_name $DATASET_NAME 
  --do_eval 
  --max_seq_length 128 
  --per_device_train_batch_size $bs 
  --learning_rate $lr 
  --num_train_epochs $epoch 
  --pre_seq_len $psl 
  --output_dir checkpoints/ $DATASET_NAME - ${model_name} / 
  --overwrite_output_dir 
  --hidden_dropout_prob $dropout 
  --seed 2233 
  --save_strategy epoch 
  --evaluation_strategy epoch 
  --prompt 
  --trigger_num 5 
  --trigger_cand_num 40 
  --backdoor targeted 
  --backdoor_steps 1 
  --warm_steps 1 
  --clean_labels $label2ids 
  --target_labels $label2bids 
  --output_dir checkpoints/ $DATASET_NAME - ${model_name} / 
  --use_checkpoint checkpoints/ $checkpoint 
  --trigger $trigger

ملاحظة: نشأ هذا المستودع من https://github.com/grasses/promptcare

اقتباس

 @inproceedings{yao2024poisonprompt,
  title={Poisonprompt: Backdoor attack on prompt-based large language models},
  author={Yao, Hongwei and Lou, Jian and Qin, Zhan},
  booktitle={ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
  pages={7745--7749},
  year={2024},
  organization={IEEE}
}
@inproceedings{yao2024PromptCARE,
  title={PromptCARE: Prompt Copyright Protection by Watermark Injection and Verification},
  author={Yao, Hongwei and Lou, Jian and Ren, Kui and Qin, Zhan},
  booktitle = {IEEE Symposium on Security and Privacy (S&P)},
  publisher = {IEEE},
  year = {2024}
}

شكر وتقدير

شكرا على:

p-tuning v2: https://github.com/thudm/p-tuning-v2
Autoprompt: https://github.com/ucinlp/autoprompt

رخصة

هذه المكتبة تحت رخصة معهد ماساتشوستس للتكنولوجيا. للحصول على معلومات حقوق الطبع والنشر والترخيص الكاملة ، يرجى عرض ملف الترخيص الذي تم توزيعه باستخدام رمز المصدر هذا.

يوسع

معلومات إضافية

الإصدار 1.0.0
النوع كود الذكاء الاصطناعي
وقت التحديث 2025-02-10
الحجم 230.68KB
من Github

تطبيقات ذات صلة

node telegram bot api

2024-12-14
typebot.io

2024-12-14
python wechaty getting started

2024-12-14
TranscriberBot

2024-12-14
genal chat

2024-12-14
Facemoji

2024-12-14

نوصي لك

chat.petals.dev

شفرة المصدر الأخرى

1.0.0
GPT Prompt Templates

شفرة المصدر الأخرى

1.0.0
GPTyped

شفرة المصدر الأخرى

GPTyped 1.0.5
node telegram bot api

كود الذكاء الاصطناعي

v0.50.0
typebot.io

كود الذكاء الاصطناعي

v3.1.2
python wechaty getting started

كود الذكاء الاصطناعي

1.0.0
waymo open dataset

شفرة المصدر الأخرى

December 2023 Update
termwind

فئات أخرى

v2.3.0
wp functions

فئات أخرى

1.0.0

أخبار ذات صلة الكل