This repo holds the code, demos, and log files for reflexion: Language Agents with Verbal Reinforcement Learning by Noah Shinn, Federico Cassano, Edward Berman, Ashwin Gopinath, Karthik Narasimhan, Shunyu Yao.
We have released the LeetcodeHardGym here
We have provided a set of notebooks to easily run, explore, and interact with the results of the reasoning experiments. Each experiment consists of a random sample of 100 questions from the HotPotQA distractor dataset. Each question in the sample is attempted by an agent with a specific type and reflexion strategy.
To get started:
git clone https://github.com/noahshinn/reflexion && cd ./hotpotqa_runs
pip install -r requirements.txt
OPENAI_API_KEY
environment variable to your OpenAI API key:export OPENAI_API_KEY=<your key>
Agent type is determined by the notebook you choose to run. The available agent types include:
ReAct
- ReAct Agent
CoT_context
- CoT Agent given supporting context about the question
CoT_no_context
- CoT Agent given no supporting context about the question
The notebook for each agent type is located in the ./hotpot_runs/notebooks
directory.
Each notebook allows you to specify the reflexion strategy to be used by the agents. The available reflexion strategies, which are defined in an Enum
, include:
reflexionStrategy.NONE
- The agent is not given any information about its last attempt.
reflexionStrategy.LAST_ATTEMPT
- The agent is given its reasoning trace from its last attempt on the question as context.
reflexionStrategy.reflexion
- The agent is given its self-reflection on the last attempt as context.
reflexionStrategy.LAST_ATTEMPT_AND_reflexion
- The agent is given both its reasoning trace and self-reflection on the last attempt as context.
Clone this repo and move to the AlfWorld directory
git clone https://github.com/noahshinn/reflexion && cd ./alfworld_runs
Specify the run parameters in ./run_reflexion.sh
.
num_trials
: number of iterative learning steps
num_envs
: number of task-environment pairs per trial
run_name
: the name for this run
use_memory
: use persisting memory to store self-reflections (turn off to run a baseline run)
is_resume
: use logging directory to resume a previous run
resume_dir
: the logging directory from which to resume the previous run
start_trial_num
: if resume run, then the trial number of which to start
Run the trial
./run_reflexion.sh
The logs will be sent to ./root/<run_name>
.
Due to the nature of these experiments, it may not be feasible for individual developers to rerun the results as GPT-4 has limited access and significant API charges. All runs from the paper and additional results are logged in ./alfworld_runs/root
for decision-making, ./hotpotqa_runs/root
for reasoning, and ./programming_runs/root
for programming
Check out the code for the original code here
Read a blog post here
Check out an interesting type-prediction implementation here: OpenTau
For all questions, contact [email protected]
@misc{shinn2023reflexion,
title={reflexion: Language Agents with Verbal Reinforcement Learning},
author={Noah Shinn and Federico Cassano and Edward Berman and Ashwin Gopinath and Karthik Narasimhan and Shunyu Yao},
year={2023},
eprint={2303.11366},
archivePrefix={arXiv},
primaryClass={cs.AI}
}