Tutorbot Spock下載 - Tutorbot Spock原始碼下載

導師機器人史波克

CLASS：基於學習科學原理建構智慧輔導系統的設計架構 (EMNLP 2023)
Shashank Sonkar、劉乃明、Debshila Basu Mallick、Richard G. Baraniuk
論文：https://arxiv.org/abs/2305.13272
分支：類

教學_對齊

大型語言模型的教學協調（EMNLP 2024）
Shashank Sonkar*、康奇尼*、Sapana Chaudhary、Richard G. Baraniuk
論文：https://arxiv.org/abs/2402.05000
分支：主幹

關於

該計畫旨在開發有效的智慧輔導代理，幫助學生培養批判性思維和解決問題的能力。

安裝

火炬
變形金剛
快聊
TRL
法學碩士

用法

請參考scripts/run.sh作為範例，該範例使用4*A100 GPU運行所選模型的訓練和評估。要在沒有訓練的情況下執行此範例，請從下面的部分下載模型並參考scripts/run_no-train.sh 。以下小節對scripts/run.sh進行了細分，並提供了更詳細的解釋。

數據集

訓練和評估使用資料集資料夾中的bio-dataset-1.json、bio-dataset-2.json、bio-dataset-3.json 和bio-dataset-ppl.json。每個都包含學生和導師之間基於 OpenAI 的 GPT-4 生成的生物學概念的模擬對話。然後將這些資料預處理為訓練和評估資料集所需的格式。請參閱分支 CLASS 以取得有關產生這些資料的說明。

配置

設定用戶參數：

 FULL_MODEL_PATH="meta-llama/Meta-Llama-3.1-8B-Instruct"
MODEL_DIR="models"
DATA_DIR="datasets"

SFT_OPTION="transformers" # choices: ["transformers", "fastchat"]

ALGO="dpo" # choices: ["dpo", "ipo", "kto"]
BETA=0.1 # choices: [0.0 - 1.0]

監督微調

預處理資料：

 python src/preprocess_sft_data.py --data_dir $DATA_DIR

我們為 SFT 提供 2 個選項：(1) Transformers (2) FastChat。

(1) 使用 Transformer 運行 SFT：

 CUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --nproc_per_node=4 --master_port=20001 src/train/train_sft.py 
      --model_path $FULL_MODEL_PATH 
      --train_dataset_path $SFT_DATASET_PATH 
      --eval_dataset_path ${DATA_DIR}/bio-test.json 
      --output_dir $SFT_MODEL_PATH 
      --cache_dir cache 
      --bf16 
      --num_train_epochs 3 
      --per_device_train_batch_size 2 
      --per_device_eval_batch_size 1 
      --gradient_accumulation_steps 2 
      --evaluation_strategy "epoch" 
      --eval_accumulation_steps 50 
      --save_strategy "epoch" 
      --seed 42 
      --learning_rate 2e-5 
      --weight_decay 0.05 
      --warmup_ratio 0.1 
      --lr_scheduler_type "cosine" 
      --logging_steps 1 
      --max_seq_length 4096 
      --gradient_checkpointing

(2) 使用 FastChat 運行 SFT：

 CUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --nproc_per_node=4 --master_port=20001 FastChat/fastchat/train/train.py 
      --model_name_or_path $FULL_MODEL_PATH 
      --data_path $SFT_DATASET_PATH 
      --eval_data_path ${DATA_DIR}/bio-test.json 
      --output_dir $SFT_MODEL_PATH 
      --cache_dir cache 
      --bf16 True 
      --num_train_epochs 3 
      --per_device_train_batch_size 2 
      --per_device_eval_batch_size 1 
      --gradient_accumulation_steps 2 
      --evaluation_strategy "epoch" 
      --eval_accumulation_steps 50 
      --save_strategy "epoch" 
      --seed 42 
      --learning_rate 2e-5 
      --weight_decay 0.05 
      --warmup_ratio 0.1 
      --lr_scheduler_type "cosine" 
      --logging_steps 1 
      --tf32 True 
      --model_max_length 4096 
      --gradient_checkpointing True

偏好調整

產生偏好數據：

 CUDA_VISIBLE_DEVICES=0,1,2,3 python src/evaluate/generate_responses.py --model_path $SFT_MODEL_PATH --output_dir ${SFT_MODEL_PATH}/final_checkpoint-dpo --test_dataset_path $DPO_DATASET_PATH --batch_size 256

python src/preprocess/preprocess_dpo_data.py --response_file ${SFT_MODEL_PATH}/final_checkpoint-dpo/responses.csv --data_file $DPO_PREF_DATASET_PATH

運行首選項對齊：

 DPO_MODEL_PATH="${MODEL_DIR}_dpo/${MODEL_NAME}_bio-tutor_${ALGO}"

CUDA_VISIBLE_DEVICES=0,1,2,3 accelerate launch --config_file=ds_config/deepspeed_zero3.yaml --num_processes=4 train/train_dpo.py 
    --train_data $DPO_PREF_DATASET_PATH 
    --model_path $SFT_MODEL_PATH 
    --output_dir $DPO_MODEL_PATH 
    --beta $BETA 
    --loss $ALGO 
    --gradient_checkpointing 
    --bf16 
    --gradient_accumulation_steps 4 
    --per_device_train_batch_size 2 
    --num_train_epochs 3

評估

評估 SFT 和 Aligned 模型的準確性和 F1 分數：

 # Generate responses from the SFT model
CUDA_VISIBLE_DEVICES=0,1,2,3 python src/evaluate/generate_responses.py --model_path $SFT_MODEL_PATH --output_dir ${SFT_MODEL_PATH}/final_checkpoint-eval --test_dataset_path $TEST_DATASET_PATH --batch_size 256

# Generate responses from the Aligned model
CUDA_VISIBLE_DEVICES=0,1,2,3 python src/evaluate/generate_responses.py --model_path $DPO_MODEL_PATH --output_dir ${DPO_MODEL_PATH}/final_checkpoint-eval --test_dataset_path $TEST_DATASET_PATH --batch_size 256

# Evaluate the SFT model
echo "Metrics of the SFT Model:"
python src/evaluate/evaluate_responses.py --response_file ${SFT_MODEL_PATH}/final_checkpoint-eval/responses.csv

# Evaluate the Aligned model
echo "Metrics of the RL Model:"
python src/evaluate/evaluate_responses.py --response_file ${DPO_MODEL_PATH}/final_checkpoint-eval/responses.csv

評估 SFT 和 Aligned 模型的 ppl：

 CUDA_VISIBLE_DEVICES=0,1 python src/evaluate/evaluate_ppl.py --model_path $SFT_MODEL_PATH

CUDA_VISIBLE_DEVICES=0,1 python src/evaluate/evaluate_ppl.py --model_path $DPO_MODEL_PATH

型號

為了更輕鬆地存取模型，請從 Hugging Face 下載它們。

SFT 型號：

Llama-3.1-8B-Instruct_bio-tutor_sft
Mistral-7B-Instruct-v0.2_bio-tutor_sft
zephyr-7b-beta_bio-tutor_sft

對齊模型：

Llama-3.1-8B-Instruct_bio-tutor_dpo
Mistral-7B-Instruct-v0.2_bio-tutor_dpo
zephyr-7b-beta_bio-tutor_dpo
Llama-3.1-8B-Instruct_bio-tutor_kto
Mistral-7B-Instruct-v0.2_bio-tutor_kto
zephyr-7b-beta_bio-tutor_kto

引文

如果您發現我們的工作有用，請引用：

 @misc{sonkar2023classdesignframeworkbuilding,
      title={CLASS: A Design Framework for building Intelligent Tutoring Systems based on Learning Science principles}, 
      author={Shashank Sonkar and Naiming Liu and Debshila Basu Mallick and Richard G. Baraniuk},
      year={2023},
      eprint={2305.13272},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2305.13272}, 
}

@misc{sonkar2024pedagogical,
      title={Pedagogical Alignment of Large Language Models}, 
      author={Shashank Sonkar and Kangqi Ni and Sapana Chaudhary and Richard G. Baraniuk},
      year={2024},
      eprint={2402.05000},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2402.05000}, 
}

展開