Unduhan Grounding_LLMs_with_online_RL - Unduhan kode sumber Grounding_LLMs_with_online

Grounding_LLMs_with_online_RL

Kode Sumber AI

1.0.0

Unduh

Membumikan Model Bahasa Besar dengan Pembelajaran Penguatan Online

Repositori ini berisi kode yang digunakan untuk makalah kami yang Membumikan Model Bahasa Besar dengan Pembelajaran Penguatan Online.

Anda dapat menemukan informasi lebih lanjut di situs web kami.

Kami melakukan landasan fungsional pengetahuan LLM di BabyAI-Text menggunakan metode GLAM : Skema utama

Kami merilis lingkungan BabyAI-Text bersama dengan kode untuk melakukan eksperimen kami (baik agen pelatihan maupun evaluasi kinerjanya). Kami mengandalkan perpustakaan Lamorel untuk menggunakan LLM.

Repositori kami disusun sebagai berikut:

? Grounding_LLMs_with_online_RL
┣ babyai-text -- lingkungan BabyAI-Text kami
┣ experiments -- kode untuk eksperimen kita
┃ ┣ agents -- implementasi semua agen kami
┃ ┃ ┣ bot -- agen bot yang memanfaatkan bot BabyAI
┃ ┃ ┣ random_agent -- agen bermain secara acak dan seragam
┃ ┃ ┣ drrn -- Agen DRRN dari sini
┃ ┃ ┣ ppo -- agen yang menggunakan PPO
┃ ┃ ┃ ┣ symbolic_ppo_agent.py -- SymbolicPPO diadaptasi dari PPO BabyAI
┃ ┃ ┃ ┗ llm_ppo_agent.py -- agen LLM kami di-ground menggunakan PPO
┃ ┣ configs -- Konfigurasi Lamorel untuk eksperimen kami
┃ ┣ slurm -- menggunakan skrip untuk meluncurkan eksperimen kami pada cluster SLURM
┃ ┣ campaign -- skrip SLURM digunakan untuk meluncurkan eksperimen kami
┃ ┣ train_language_agent.py -- agen kereta menggunakan BabyAI-Text (LLM dan DRRN) -> berisi implementasi kami atas hilangnya PPO untuk LLM serta kepala tambahan di atas LLM
┃ ┣ train_symbolic_ppo.py -- melatih SymbolicPPO di BabyAI (dengan tugas BabyAI-Text)
┃ ┣ post-training_tests.py -- tes generalisasi agen terlatih
┃ ┣ test_results.py -- berguna untuk memformat hasil
┃ ┗ clm_behavioral-cloning.py -- kode untuk melakukan Kloning Perilaku pada LLM menggunakan lintasan

Langkah-langkah instalasi

Buat conda env

 conda create -n dlp python=3.10.8; conda activate dlp

Instal PyTorch

 conda install pytorch==1.12.1 torchvision==0.13.1 torchaudio==0.12.1 cudatoolkit=11.3 -c pytorch

Instal paket yang dibutuhkan oleh paket kami

 pip install -r requirements.txt

Instal BabyAI-Text : Lihat detail instalasi di paket babyai-text
Instal Lamorel

 git clone https://github.com/flowersteam/lamorel.git; cd lamorel/lamorel; pip install -e .; cd ../..

Meluncurkan

Silakan gunakan Lamorel bersama dengan konfigurasi kami. Anda dapat menemukan contoh skrip pelatihan kami di kampanye.

Melatih Model Bahasa

Untuk melatih Model Bahasa di lingkungan BabyAI-Text, seseorang harus menggunakan file train_language_agent.py . Skrip ini (diluncurkan dengan Lamorel) menggunakan entri konfigurasi berikut:

 rl_script_args :
  seed : 1
  number_envs : 2 # Number of parallel envs to launch (steps will be synchronized, i.e. a step call will return number_envs observations)
  num_steps : 1000 # Total number of training steps
  max_episode_steps : 3 # Maximum number of steps in a single episode
  frames_per_proc : 40 # The number of collected transitions to perform a PPO update will be frames_per_proc*number_envs
  discount : 0.99 # Discount factor used in PPO
  lr : 1e-6 # Learning rate used to finetune the LLM
  beta1 : 0.9 # PPO's hyperparameter
  beta2 : 0.999 # PPO's hyperparameter
  gae_lambda : 0.99 # PPO's hyperparameter
  entropy_coef : 0.01 # PPO's hyperparameter
  value_loss_coef : 0.5 # PPO's hyperparameter
  max_grad_norm : 0.5 # Maximum grad norm when updating the LLM's parameters
  adam_eps : 1e-5 # Adam's hyperparameter
  clip_eps : 0.2 # Epsilon used in PPO's losses clipping
  epochs : 4 # Number of PPO epochs performed on each set of collected trajectories
  batch_size : 16 # Minibatch size
  action_space : ["turn_left","turn_right","go_forward","pick_up","drop","toggle"] # Possible actions for the agent
  saving_path_logs : ??? # Where to store logs
  name_experiment : ' llm_mtrl ' # Useful for logging
  name_model : ' T5small ' # Useful for logging
  saving_path_model : ??? # Where to store the finetuned model
  name_environment : ' BabyAI-MixedTestLocal-v0 ' # BabiAI-Text's environment 
  load_embedding : true # Whether trained embedding layers should be loaded (useful when lm_args.pretrained=False). Setting both this and use_action_heads to True (lm_args.pretrained=False) creates our NPAE agent.
  use_action_heads : false # Whether action heads should be used instead of scoring. Setting both this and use_action_heads to True (lm_args.pretrained=False) creates our NPAE agent.
  template_test : 1 # Which prompt template to use to log evolution of action's probability (Section C of our paper). Choices or [1, 2].
  nbr_obs : 3 # Number of past observation used in the prompt

Untuk entri konfigurasi yang terkait dengan Model Bahasa itu sendiri, silakan lihat Lamorel.

Mengevaluasi penampilan pada episode tes

Untuk mengevaluasi kinerja agen (misalnya LLM terlatih, bot BabyAI...) pada tugas pengujian, gunakan post-training_tests.py dan atur entri konfigurasi berikut:

 rl_script_args :
  seed : 1
  number_envs : 2 # Number of parallel envs to launch (steps will be synchronized, i.e. a step call will return number_envs observations)
  max_episode_steps : 3 # Maximum number of steps in a single episode
  action_space : ["turn_left","turn_right","go_forward","pick_up","drop","toggle"] # Possible actions for the agent
  saving_path_logs : ??? # Where to store logs
  name_experiment : ' llm_mtrl ' # Useful for logging
  name_model : ' T5small ' # Useful for logging
  saving_path_model : ??? # Where to store the finetuned model
  name_environment : ' BabyAI-MixedTestLocal-v0 ' # BabiAI-Text's environment 
  load_embedding : true # Whether trained embedding layers should be loaded (useful when lm_args.pretrained=False). Setting both this and use_action_heads to True (lm_args.pretrained=False) creates our NPAE agent.
  use_action_heads : false # Whether action heads should be used instead of scoring. Setting both this and use_action_heads to True (lm_args.pretrained=False) creates our NPAE agent.
  nbr_obs : 3 # Number of past observation used in the prompt
  number_episodes : 10 # Number of test episodes
  language : ' english ' # Useful to perform the French experiment (Section H4)
  zero_shot : true # Whether the zero-shot LLM (i.e. without finetuning should be used)
  modified_action_space : false # Whether a modified action space (e.g. different from the one seen during training) should be used
  new_action_space : # ["rotate_left","rotate_right","move_ahead","take","release","switch"] # Modified action space
  im_learning : false # Whether a LLM produced with Behavioral Cloning should be used
  im_path : " " # Path to the LLM learned with Behavioral Cloning
  bot : false # Whether the BabyAI's bot agent should be used

Memperluas

Informasi Tambahan

Versi 1.0.0
Tipe Kode Sumber AI
Waktu Pembaruan 2024-12-30
ukuran 50MB
Berasal dari Github

Aplikasi Terkait

OpenCore_NO_ACPI_Build

2024-11-13
nspanel_pro_tools_apk

2024-11-12
YuQue_Book_Download

2024-11-12
zkwork_aleo_gpu_worker

2024-11-11
nextcloud_share_url_downloader

2024-11-01
Mesin analisis data Lihua versi gratis 3.0_search_navigation_collection_public opinion_ranking_api

2022-06-28

Direkomendasikan untuk Anda

chat.petals.dev

Kode sumber lainnya

1.0.0
GPT Prompt Templates

Kode sumber lainnya

1.0.0
GPTyped

Kode sumber lainnya

GPTyped 1.0.5
node telegram bot api

Kode Sumber AI

v0.50.0
typebot.io

Kode Sumber AI

v3.1.2
python wechaty getting started

Kode Sumber AI

1.0.0
waymo open dataset

Kode sumber lainnya

December 2023 Update
termwind

Kategori lainnya

v2.3.0
wp functions

Kategori lainnya

1.0.0

Informasi Terkait Semua