Download Grounding_LLMs_with_online_RL - Download do código-fonte Grounding_LLMs_with_online

Grounding_LLMs_with_online_RL

Código-Fonte de IA

1.0.0

Baixar

Fundamentando grandes modelos de idiomas com aprendizado por reforço on-line

Este repositório contém o código usado em nosso artigo Grounding Large Language Models with Online Reinforcement Learning.

Você pode encontrar mais informações em nosso site.

Realizamos a fundamentação funcional do conhecimento dos LLMs em BabyAI-Text usando o método GLAM : Esquema principal

Liberamos nosso ambiente BabyAI-Text junto com o código para realizar nossos experimentos (tanto treinando agentes quanto avaliando seu desempenho). Contamos com a biblioteca Lamorel para usar LLMs.

Nosso repositório está estruturado da seguinte forma:

? Grounding_LLMs_with_online_RL
┣ babyai-text - nosso ambiente BabyAI-Text
┣ experiments – código para nossos experimentos
┃ ┣ agents - implementação de todos os nossos agentes
┃ ┃ ┣ bot - agente de bot aproveitando o bot da BabyAI
┃ ┃ ┣ random_agent -- agente jogando uniformemente aleatório
┃ ┃ ┣ drrn -- Agente DRRN daqui
┃ ┃ ┣ ppo -- agentes usando PPO
┃ ┃ ┃ ┣ symbolic_ppo_agent.py -- SymbolicPPO adaptado do PPO da BabyAI
┃ ┃ ┃ ┗ llm_ppo_agent.py - nosso agente LLM fundamentado usando PPO
┃ ┣ configs - Configurações Lamorel para nossos experimentos
┃ ┣ slurm – scripts utilitários para lançar nossos experimentos em um cluster SLURM
┃ ┣ campaign – scripts SLURM usados para lançar nossos experimentos
┃ ┣ train_language_agent.py -- treina agentes usando BabyAI-Text (LLMs e DRRN) -> contém nossa implementação de perda de PPO para LLMs, bem como cabeças adicionais sobre LLMs
┃ ┣ train_symbolic_ppo.py - treina SymbolicPPO no BabyAI (com tarefas do BabyAI-Text)
┃ ┣ post-training_tests.py – testes de generalização de agentes treinados
┃ ┣ test_results.py -- utilitários para formatar resultados
┃ ┗ clm_behavioral-cloning.py – código para realizar clonagem comportamental em um LLM usando trajetórias

Etapas de instalação

Criar ambiente conda

 conda create -n dlp python=3.10.8; conda activate dlp

Instale PyTorch

 conda install pytorch==1.12.1 torchvision==0.13.1 torchaudio==0.12.1 cudatoolkit=11.3 -c pytorch

Instale os pacotes exigidos pelo nosso pacote

 pip install -r requirements.txt

Instale BabyAI-Text : Veja os detalhes de instalação no pacote babyai-text
Instalar Lamorel

 git clone https://github.com/flowersteam/lamorel.git; cd lamorel/lamorel; pip install -e .; cd ../..

Lançar

Por favor, use Lamorel junto com nossas configurações. Você pode encontrar exemplos de nossos roteiros de treinamento em campanha.

Treinando um modelo de linguagem

Para treinar um modelo de linguagem em um ambiente BabyAI-Text, deve-se usar o arquivo train_language_agent.py . Este script (lançado com Lamorel) usa as seguintes entradas de configuração:

 rl_script_args :
  seed : 1
  number_envs : 2 # Number of parallel envs to launch (steps will be synchronized, i.e. a step call will return number_envs observations)
  num_steps : 1000 # Total number of training steps
  max_episode_steps : 3 # Maximum number of steps in a single episode
  frames_per_proc : 40 # The number of collected transitions to perform a PPO update will be frames_per_proc*number_envs
  discount : 0.99 # Discount factor used in PPO
  lr : 1e-6 # Learning rate used to finetune the LLM
  beta1 : 0.9 # PPO's hyperparameter
  beta2 : 0.999 # PPO's hyperparameter
  gae_lambda : 0.99 # PPO's hyperparameter
  entropy_coef : 0.01 # PPO's hyperparameter
  value_loss_coef : 0.5 # PPO's hyperparameter
  max_grad_norm : 0.5 # Maximum grad norm when updating the LLM's parameters
  adam_eps : 1e-5 # Adam's hyperparameter
  clip_eps : 0.2 # Epsilon used in PPO's losses clipping
  epochs : 4 # Number of PPO epochs performed on each set of collected trajectories
  batch_size : 16 # Minibatch size
  action_space : ["turn_left","turn_right","go_forward","pick_up","drop","toggle"] # Possible actions for the agent
  saving_path_logs : ??? # Where to store logs
  name_experiment : ' llm_mtrl ' # Useful for logging
  name_model : ' T5small ' # Useful for logging
  saving_path_model : ??? # Where to store the finetuned model
  name_environment : ' BabyAI-MixedTestLocal-v0 ' # BabiAI-Text's environment 
  load_embedding : true # Whether trained embedding layers should be loaded (useful when lm_args.pretrained=False). Setting both this and use_action_heads to True (lm_args.pretrained=False) creates our NPAE agent.
  use_action_heads : false # Whether action heads should be used instead of scoring. Setting both this and use_action_heads to True (lm_args.pretrained=False) creates our NPAE agent.
  template_test : 1 # Which prompt template to use to log evolution of action's probability (Section C of our paper). Choices or [1, 2].
  nbr_obs : 3 # Number of past observation used in the prompt

Para as entradas de configuração relacionadas ao próprio modelo de linguagem, consulte Lamorel.

Avaliando desempenhos em episódios de teste

Para avaliar o desempenho de um agente (por exemplo, um LLM treinado, bot da BabyAI...) em tarefas de teste, use post-training_tests.py e defina as seguintes entradas de configuração:

 rl_script_args :
  seed : 1
  number_envs : 2 # Number of parallel envs to launch (steps will be synchronized, i.e. a step call will return number_envs observations)
  max_episode_steps : 3 # Maximum number of steps in a single episode
  action_space : ["turn_left","turn_right","go_forward","pick_up","drop","toggle"] # Possible actions for the agent
  saving_path_logs : ??? # Where to store logs
  name_experiment : ' llm_mtrl ' # Useful for logging
  name_model : ' T5small ' # Useful for logging
  saving_path_model : ??? # Where to store the finetuned model
  name_environment : ' BabyAI-MixedTestLocal-v0 ' # BabiAI-Text's environment 
  load_embedding : true # Whether trained embedding layers should be loaded (useful when lm_args.pretrained=False). Setting both this and use_action_heads to True (lm_args.pretrained=False) creates our NPAE agent.
  use_action_heads : false # Whether action heads should be used instead of scoring. Setting both this and use_action_heads to True (lm_args.pretrained=False) creates our NPAE agent.
  nbr_obs : 3 # Number of past observation used in the prompt
  number_episodes : 10 # Number of test episodes
  language : ' english ' # Useful to perform the French experiment (Section H4)
  zero_shot : true # Whether the zero-shot LLM (i.e. without finetuning should be used)
  modified_action_space : false # Whether a modified action space (e.g. different from the one seen during training) should be used
  new_action_space : # ["rotate_left","rotate_right","move_ahead","take","release","switch"] # Modified action space
  im_learning : false # Whether a LLM produced with Behavioral Cloning should be used
  im_path : " " # Path to the LLM learned with Behavioral Cloning
  bot : false # Whether the BabyAI's bot agent should be used

Expandir

Informações adicionais

Versão 1.0.0
Tipo Código-Fonte de IA
Data da Última Atualização 2024-12-30
tamanho 50MB
Vindo de Github

Aplicativos Relacionados

OpenCore_NO_ACPI_Build

2024-11-13
nspanel_pro_tools_apk

2024-11-12
YuQue_Book_Download

2024-11-12
zkwork_aleo_gpu_worker

2024-11-11
nextcloud_share_url_downloader

2024-11-01
Mecanismo de análise de dados Lihua versão gratuita 3.0_search_navigation_collection_public parecer_ranking_api

2022-06-28

Recomendado para você

chat.petals.dev

Outro código-fonte

1.0.0
GPT Prompt Templates

Outro código-fonte

1.0.0
GPTyped

Outro código-fonte

GPTyped 1.0.5
node telegram bot api

Código-Fonte de IA

v0.50.0
typebot.io

Código-Fonte de IA

v3.1.2
python wechaty getting started

Código-Fonte de IA

1.0.0
waymo open dataset

Outro código-fonte

December 2023 Update
termwind

Outras categorias

v2.3.0
wp functions

Outras categorias

1.0.0

Informações Relacionadas Todos