Descarga Grounding_LLMs_with_online_RL - Descarga del código fuente Grounding_LLMs_with_online

Grounding_LLMs_with_online_RL

Código Fuente de IA

1.0.0

Descargar

Fundamentar modelos de lenguaje grandes con aprendizaje por refuerzo en línea

Este repositorio contiene el código utilizado para nuestro artículo Fundamentación de modelos de lenguaje grandes con aprendizaje por refuerzo en línea.

Puede encontrar más información en nuestro sitio web.

Realizamos una base funcional del conocimiento de los LLM en BabyAI-Text utilizando el método GLAM : Esquema principal

Lanzamos nuestro entorno BabyAI-Text junto con el código para realizar nuestros experimentos (tanto entrenando agentes como evaluando su desempeño). Confiamos en la biblioteca Lamorel para utilizar LLM.

Nuestro repositorio está estructurado de la siguiente manera:

? Grounding_LLMs_with_online_RL
┣ babyai-text -- nuestro entorno BabyAI-Text
┣ experiments : código para nuestros experimentos
┃ ┣ agents - implementación de todos nuestros agentes
┃ ┃ ┣ bot : agente de bot que aprovecha el bot de BabyAI
┃ ┃ ┣ random_agent -- agente que juega uniformemente al azar
┃ ┃ ┣ drrn -- Agente DRRN desde aquí
┃ ┃ ┣ ppo -- agentes que usan PPO
┃ ┃ ┃ ┣ symbolic_ppo_agent.py -- SymbolicPPO adaptado del PPO de BabyAI
┃ ┃ ┃ ┗ llm_ppo_agent.py -- nuestro agente LLM castigado usando PPO
┃ ┣ configs - Configuraciones de Lamorel para nuestros experimentos
┃ ┣ slurm -- scripts de utilidades para iniciar nuestros experimentos en un clúster SLURM
┃ ┣ campaign : scripts SLURM utilizados para lanzar nuestros experimentos
┃ ┣ train_language_agent.py - capacitar a los agentes usando BabyAI-Text (LLM y DRRN) -> contiene nuestra implementación de pérdida de PPO para LLM, así como jefes adicionales además de los LLM
┃ ┣ train_symbolic_ppo.py -- entrena SymbolicPPO en BabyAI (con las tareas de BabyAI-Text)
┃ ┣ post-training_tests.py -- pruebas de generalización de agentes capacitados
┃ ┣ test_results.py -- utilidades para formatear resultados
┃ ┗ clm_behavioral-cloning.py - código para realizar clonación conductual en un LLM usando trayectorias

Pasos de instalación

Crear entorno de conda

 conda create -n dlp python=3.10.8; conda activate dlp

Instalar PyTorch

 conda install pytorch==1.12.1 torchvision==0.13.1 torchaudio==0.12.1 cudatoolkit=11.3 -c pytorch

Instalar los paquetes requeridos por nuestro paquete.

 pip install -r requirements.txt

Instale BabyAI-Text : consulte los detalles de instalación en el paquete babyai-text
Instalar Lamorel

 git clone https://github.com/flowersteam/lamorel.git; cd lamorel/lamorel; pip install -e .; cd ../..

Lanzamiento

Utilice Lamorel junto con nuestras configuraciones. Puede encontrar ejemplos de nuestros guiones de formación en campaña.

Entrenando un modelo de lenguaje

Para entrenar un modelo de lenguaje en un entorno BabyAI-Text, se debe usar el archivo train_language_agent.py . Este script (ejecutado con Lamorel) utiliza las siguientes entradas de configuración:

 rl_script_args :
  seed : 1
  number_envs : 2 # Number of parallel envs to launch (steps will be synchronized, i.e. a step call will return number_envs observations)
  num_steps : 1000 # Total number of training steps
  max_episode_steps : 3 # Maximum number of steps in a single episode
  frames_per_proc : 40 # The number of collected transitions to perform a PPO update will be frames_per_proc*number_envs
  discount : 0.99 # Discount factor used in PPO
  lr : 1e-6 # Learning rate used to finetune the LLM
  beta1 : 0.9 # PPO's hyperparameter
  beta2 : 0.999 # PPO's hyperparameter
  gae_lambda : 0.99 # PPO's hyperparameter
  entropy_coef : 0.01 # PPO's hyperparameter
  value_loss_coef : 0.5 # PPO's hyperparameter
  max_grad_norm : 0.5 # Maximum grad norm when updating the LLM's parameters
  adam_eps : 1e-5 # Adam's hyperparameter
  clip_eps : 0.2 # Epsilon used in PPO's losses clipping
  epochs : 4 # Number of PPO epochs performed on each set of collected trajectories
  batch_size : 16 # Minibatch size
  action_space : ["turn_left","turn_right","go_forward","pick_up","drop","toggle"] # Possible actions for the agent
  saving_path_logs : ??? # Where to store logs
  name_experiment : ' llm_mtrl ' # Useful for logging
  name_model : ' T5small ' # Useful for logging
  saving_path_model : ??? # Where to store the finetuned model
  name_environment : ' BabyAI-MixedTestLocal-v0 ' # BabiAI-Text's environment 
  load_embedding : true # Whether trained embedding layers should be loaded (useful when lm_args.pretrained=False). Setting both this and use_action_heads to True (lm_args.pretrained=False) creates our NPAE agent.
  use_action_heads : false # Whether action heads should be used instead of scoring. Setting both this and use_action_heads to True (lm_args.pretrained=False) creates our NPAE agent.
  template_test : 1 # Which prompt template to use to log evolution of action's probability (Section C of our paper). Choices or [1, 2].
  nbr_obs : 3 # Number of past observation used in the prompt

Para las entradas de configuración relacionadas con el modelo de lenguaje en sí, consulte Lamorel.

Evaluación del desempeño en episodios de prueba.

Para evaluar el desempeño de un agente (por ejemplo, un LLM capacitado, el bot de BabyAI...) en tareas de prueba, use post-training_tests.py y establezca las siguientes entradas de configuración:

 rl_script_args :
  seed : 1
  number_envs : 2 # Number of parallel envs to launch (steps will be synchronized, i.e. a step call will return number_envs observations)
  max_episode_steps : 3 # Maximum number of steps in a single episode
  action_space : ["turn_left","turn_right","go_forward","pick_up","drop","toggle"] # Possible actions for the agent
  saving_path_logs : ??? # Where to store logs
  name_experiment : ' llm_mtrl ' # Useful for logging
  name_model : ' T5small ' # Useful for logging
  saving_path_model : ??? # Where to store the finetuned model
  name_environment : ' BabyAI-MixedTestLocal-v0 ' # BabiAI-Text's environment 
  load_embedding : true # Whether trained embedding layers should be loaded (useful when lm_args.pretrained=False). Setting both this and use_action_heads to True (lm_args.pretrained=False) creates our NPAE agent.
  use_action_heads : false # Whether action heads should be used instead of scoring. Setting both this and use_action_heads to True (lm_args.pretrained=False) creates our NPAE agent.
  nbr_obs : 3 # Number of past observation used in the prompt
  number_episodes : 10 # Number of test episodes
  language : ' english ' # Useful to perform the French experiment (Section H4)
  zero_shot : true # Whether the zero-shot LLM (i.e. without finetuning should be used)
  modified_action_space : false # Whether a modified action space (e.g. different from the one seen during training) should be used
  new_action_space : # ["rotate_left","rotate_right","move_ahead","take","release","switch"] # Modified action space
  im_learning : false # Whether a LLM produced with Behavioral Cloning should be used
  im_path : " " # Path to the LLM learned with Behavioral Cloning
  bot : false # Whether the BabyAI's bot agent should be used

Expandir

Información adicional

Versión 1.0.0
Tipo Código Fuente de IA
Fecha de actualización 2024-12-30
tamaño 50MB
Proviene de Github

Aplicaciones relacionadas

OpenCore_NO_ACPI_Build

2024-11-13
nspanel_pro_tools_apk

2024-11-12
YuQue_Book_Download

2024-11-12
zkwork_aleo_gpu_worker

2024-11-11
nextcloud_share_url_downloader

2024-11-01
Motor de análisis de datos Lihua versión gratuita 3.0_search_navigation_collection_public opinion_ranking_api

2022-06-28

Recomendado para ti

chat.petals.dev

Otro código fuente

1.0.0
GPT Prompt Templates

Otro código fuente

1.0.0
GPTyped

Otro código fuente

GPTyped 1.0.5
node telegram bot api

Código Fuente de IA

v0.50.0
typebot.io

Código Fuente de IA

v3.1.2
python wechaty getting started

Código Fuente de IA

1.0.0
waymo open dataset

Otro código fuente

December 2023 Update
termwind

Otras categorias

v2.3.0
wp functions

Otras categorias

1.0.0

Información relacionada Todo