pretraining with human feedback 다운로드 - pretraining with human feedback 소스 코드 다운로드

pretraining with human feedback

기타 소스코드

1.0.0

다운로드

인간 선호도에 따른 언어 모델 사전 훈련

이 저장소에는 인간 기본 설정을 사용한 언어 모델 사전 훈련 논문과 함께 제공되는 코드가 포함되어 있습니다. 코드베이스는 Hugging Face Transformers의 Trainer 기반으로 구축되었으며 백서에서 논의된 인간 피드백(PHF)을 사용한 사전 훈련을 위한 5가지 목표 구현과 이를 평가하기 위한 콜백 및 스크립트가 포함되어 있습니다.

PHF 목표는 훈련 데이터에 보상으로 주석을 달고 Trainer.compute_loss 덮어써 추가 훈련 신호로 사용함으로써 구현할 수 있습니다. 보상은 apo.scorers.Scorer 인스턴스에 의해 제공됩니다. 주어진 텍스트에 대해 비공격성과 같은 인간 선호도에 맞춰 정렬되었는지 또는 잘못 정렬되었는지 여부를 결정할 수 있는 개체입니다. 득점자는 PHF 교육을 받은 LM의 샘플을 평가하는 데에도 사용됩니다.

코드베이스는 Hugging Face 생태계와 지팡이(모니터링 및 실험 관리용)를 중심으로 구축되었습니다.

빠른 시작

우리는 Python 3.9+를 가정합니다. 독성 작업에서 MLE용 훈련 스크립트를 실행하려면 다음을 수행하십시오.

pip install -r requirements.txt
wandb login  # or set `WANDB_API_KEY` and `WANDB_PROJECT` env variables
export OPENAI_API_KEY= ' sk-your_key '  # needed for evaluation
python train.py --task configs/toxicity/pretrain.yml --method configs/toxicity/mle.yml

구성

train.py 스크립트에는 작업용과 메서드용이라는 두 가지 구성 파일에 대한 경로가 필요합니다. 작업용 구성 파일( toxicity , pii , pep8 )은 YAML 파일인 configs/{task}/pretrain.yml (사전 학습 실험용) 및 configs/{task}/finetuning.yml (미세 조정용)에 저장됩니다. 메서드 구성 파일은 configs/{task} 디렉터리에 별도로 저장됩니다. 각 작업-방법 구성 쌍(사전 학습 및 미세 조정용)에는 실험에서 사용한 하이퍼파라미터가 포함되어 있으며 논문의 결과를 재현할 수 있습니다.

개별 매개변수는 override 인수를 사용하여 명령줄에서 재정의할 수 있습니다. 예를 들어:

python train.py --task configs/toxicity/pretrain.yml --method configs/toxicity/mle.yml --override training.per_device_train_batch_size=8

작업

이름	구성 파일	훈련 데이터	득점자	설명
독성	`configs/toxicity`	`tomekkorbak/pile-detoxify`	`DetoxifyToxicityScorer`	Misalignment 점수는 해독에 따른 독성 확률입니다.
개인 식별 정보	`configs/pii`	`tomekkorbak/pile-pii-scrubadub`	`PIIScorer`	정렬 불량 점수는 scrapadub에 따른 문자당 PII(예: 이름, URL) 수입니다.
PEP8	`configs/pep8`	`kejian/codeparrot-train-more-filter-3.3b-cleaned`	`PEP8Scorer`	정렬 불량 점수는 pycodestyle에 따른 문자당 PEP8 위반 수입니다.

목표

실험에 사용된 인간 피드백을 이용한 훈련의 6가지 목표는 다음과 같이 구현됩니다.

이름	객관적인 수업	설명
MLE	`MLE`	PyTorch `CrossEntropyLoss` 둘러싼 얇은 래퍼
필터링	`MLE`	구성에서 `dataset.filter_threshold` 설정해야 합니다.
조건부 훈련	`MLE`	또한 config`에서 `dataset.conditional_training_config` 설정해야 합니다.
가능성이 낮음	`Unlikelihood`	또한 하이퍼파라미터 `objective.score_threshold` 및 `objective.alpha` 설정해야 합니다.
AWR	`AWR`	또한 하이퍼파라미터인 `objective.alpha` 및 `objective.beta` 설정해야 합니다.
RWR	`AWR`	`objective.alpha=1` 인 AWR의 특별한 경우

사전 학습된 모델

실험에서 사전 훈련된 모델은 HugginFace Hub에서 사용할 수 있습니다.

목적	독성	PEP8	개인 식별 정보
MLE	tomekkorbak/goofy_pasteur	케지안/mighty-mle	tomekkorbak/nervous_wozniak
필터링 중앙값	tomekkorbak/amazing_shannon	케지안/마이티필터링	tomekkorbak/cocky_carson
가정 어구	tomekkorbak/hungry_saha	케지안/마이티 조건부	tomekkorbak/boring_mcclintock
UL	tomekkorbak/nifty_banach	케지안/mighty-ul	tomekkorbak/affectionate_wescoff
AWR	tomekkorbak/upbeat_ramanujan	케지안/vigor-awr	tomekkorbak/confident_knuth
RWR	tomekkorbak/keen_clarke	케지안/mighty-rwr	tomekkorbak/gifted_hugle

측정항목

각 평가 단계에서 apo.callbacks.GenerateAndScoreCallback 작업 구성 파일에 제공된 GenerationScenario 목록을 반복합니다. 각 시나리오에 대해 num_samples 샘플이 생성되고 다음 wandb 측정항목이 계산됩니다.

score , 채점자가 할당한 생성된 샘플의 평균 정렬 불량( num_samples 샘플 전체)
- score_max@25 , 25개 샘플의 평균 최대 점수(RealToxicityPrompts 논문에서 예상되는 최대 독성과 유사)
current_samples , 프롬프트(있는 경우) 및 점수와 함께 샘플의 wandb.Table

LM 샘플의 점수를 매기는 것 외에도 apo.callbacks.KLGPT3Callback 사용하여 GPT-3에서 현재 LM의 KL을 추정합니다. 이를 위해서는 캐시되어 후속 반복에서 재사용되는 GPT-3의 드로잉 샘플이 필요합니다. |

코드베이스 구조

 .
├── apo
│   ├── callbacks.py  # callbacks implementing the evaluation pipeline 
│   ├── dataset_wrappers.py  # an iterable for streaming blocks of tokens for training
│   ├── kl_gpt3.py  # logic for measuring KL from GPT-3
│   └── metrics.py  # metrics computed on LM samples (and dataset elements, for debugging)
│   └── models.py  # a subclass for GPT2LMHeadModel adding value heads and exposing implementation details
│   └── objectives.py  # classes implementing loss functions
│   ├── scorer_utils.py
│   ├── scorers.py  # classes for scoring LM samples and dataset elements
│   └── trainer.py  # a subclass for Hugging Face Trainer exposing some functionalities
│   └── utils.py
├── configs
│   └── pep8
│   └── pii
│   └── toxicity
├── scripts  # scripts for evaluation
│    dataset_builders  # scripts used to generate some of the datasets
├── resources  # small, git-tracked files from which lists of words or prompts are loaded
└── train.py  # the main training script

인용

 @misc { https://doi.org/10.48550/arxiv.2302.08582 ,
  doi = { 10.48550/ARXIV.2302.08582 } ,
  url = { https://arxiv.org/abs/2302.08582 } ,
  author = { Korbak, Tomasz and Shi, Kejian and Chen, Angelica and Bhalerao, Rasika and Buckley, Christopher L. and Phang, Jason and Bowman, Samuel R. and Perez, Ethan } ,
  keywords = { Computation and Language (cs.CL), Machine Learning (cs.LG), FOS: Computer and information sciences, FOS: Computer and information sciences } ,
  title = { Pretraining Language Models with Human Preferences } ,
  publisher = { arXiv } ,  
  year = { 2023 } ,
  copyright = { Creative Commons Attribution 4.0 International }
}