[Paper] | [Blog Post] | [Drive Folder]
One of the grand challenges of artificial intelligence is developing agents capable of conducting scientific research and discovering new knowledge. While frontier models have already been used to aid human scientists—for example, for brainstorming ideas or writing code—they still require extensive manual supervision or are heavily constrained to specific tasks.
We're excited to introduce The AI Scientist, the first comprehensive system for fully automatic scientific discovery, enabling Foundation Models such as Large Language Models (LLMs) to perform research independently.
We provide all runs and data from our paper here, where we run each base model on each template for approximately 50 ideas. We highly recommend reading through some of the Claude papers to get a sense of the system's strengths and weaknesses. Here are some example papers generated by The AI Scientist :
DualScale Diffusion: Adaptive Feature Balancing for Low-Dimensional Generative Models
Multi-scale Grid Noise Adaptation: Enhancing Diffusion Models For Low-dimensional Data
GAN-Enhanced Diffusion: Boosting Sample Quality and Diversity
DualDiff: Enhancing Mode Capture in Low-dimensional Diffusion Models via Dual-expert Denoising
StyleFusion: Adaptive Multi-style Generation in Character-Level Language Models
Adaptive Learning Rates for Transformers via Q-Learning
Unlocking Grokking: A Comparative Study of Weight Initialization Strategies in Transformer Models
Grokking Accelerated: Layer-wise Learning Rates for Transformer Generalization
Grokking Through Compression: Unveiling Sudden Generalization via Minimal Description Length
Accelerating Mathematical Insight: Boosting Grokking Through Strategic Data Augmentation
Note:
Caution! This codebase will execute LLM-written code. There are various risks and challenges associated with this autonomy, including the use of potentially dangerous packages, web access, and potential spawning of processes. Use at your own discretion. Please make sure to containerize and restrict web access appropriately.
Introduction
Requirements
Installation
Supported Models and API Keys
Setting Up the Templates
NanoGPT Template
2D Diffusion Template
Grokking Template
Run AI Scientist Paper Generation Experiments
Getting an LLM-Generated Paper Review
Making Your Own Template
Community-Contributed Templates
Template Resources
Citing The AI Scientist
Frequently Asked Questions
Containerization
We provide three templates, which were used in our paper, covering the following domains: NanoGPT, 2D Diffusion, and Grokking. These templates enable The AI Scientist to generate ideas and conduct experiments in these areas. We accept contributions of new templates from the community, but please note that they are not maintained by us. All other templates beyond the three provided are community contributions.
This code is designed to run on Linux with NVIDIA GPUs using CUDA and PyTorch. Support for other GPU architectures may be possible by following the PyTorch guidelines. The current templates would likely take an infeasible amount of time on CPU-only machines. Running on other operating systems may require significant adjustments.
conda create -n ai_scientist python=3.11 conda activate ai_scientist# Install pdflatexsudo apt-get install texlive-full# Install PyPI requirementspip install -r requirements.txt
Note: Installing texlive-full
can take a long time. You may need to hold Enter during the installation.
We support a wide variety of models, including open-weight and API-only models. In general, we recommend using only frontier models above the capability of the original GPT-4. To see a full list of supported models, see here.
By default, this uses the OPENAI_API_KEY
environment variable.
By default, this uses the ANTHROPIC_API_KEY
environment variable.
For Claude models provided by Amazon Bedrock, please install these additional packages:
pip install anthropic[bedrock]
Next, specify a set of valid AWS Credentials and the target AWS Region:
Set the environment variables: AWS_ACCESS_KEY_ID
, AWS_SECRET_ACCESS_KEY
, AWS_REGION_NAME
.
For Claude models provided by Vertex AI Model Garden, please install these additional packages:
pip install google-cloud-aiplatform pip install anthropic[vertex]
Next, set up valid authentication for a Google Cloud project, for example by providing the region and project ID:
export CLOUD_ML_REGION="REGION" # for Model Garden callexport ANTHROPIC_VERTEX_PROJECT_ID="PROJECT_ID" # for Model Garden callexport VERTEXAI_LOCATION="REGION" # for Aider/LiteLLM callexport VERTEXAI_PROJECT="PROJECT_ID" # for Aider/LiteLLM call
By default, this uses the DEEPSEEK_API_KEY
environment variable.
By default, this uses the OPENROUTER_API_KEY
environment variable.
Our code can also optionally use a Semantic Scholar API Key (S2_API_KEY
) for higher throughput if you have one, though it should work without it in principle. If you have problems with Semantic Scholar, you can skip the literature search and citation phases of paper generation.
Be sure to provide the key for the model used for your runs, e.g.:
export OPENAI_API_KEY="YOUR KEY HERE"export S2_API_KEY="YOUR KEY HERE"
This section provides instructions for setting up each of the three templates used in our paper. Before running The AI Scientist experiments, please ensure you have completed the setup steps for the templates you are interested in.
Description: This template investigates transformer-based autoregressive next-token prediction tasks.
Setup Steps:
Prepare the data:
python data/enwik8/prepare.py python data/shakespeare_char/prepare.py python data/text8/prepare.py
Create baseline runs (machine dependent):
# Set up NanoGPT baseline run# NOTE: YOU MUST FIRST RUN THE PREPARE SCRIPTS ABOVE!cd templates/nanoGPT python experiment.py --out_dir run_0 python plot.py
Description: This template studies improving the performance of diffusion generative models on low-dimensional datasets.
Setup Steps:
Install dependencies:
# Set up 2D Diffusiongit clone https://github.com/gregversteeg/NPEET.gitcd NPEET pip install .pip install scikit-learn
Create baseline runs:
# Set up 2D Diffusion baseline runcd templates/2d_diffusion python experiment.py --out_dir run_0 python plot.py
Description: This template investigates questions about generalization and learning speed in deep neural networks.
Setup Steps:
Install dependencies:
# Set up Grokkingpip install einops
Create baseline runs:
# Set up Grokking baseline runcd templates/grokking python experiment.py --out_dir run_0 python plot.py
Note: Please ensure the setup steps above are completed before running these experiments.
conda activate ai_scientist# Run the paper generation.python launch_scientist.py --model "gpt-4o-2024-05-13" --experiment nanoGPT_lite --num-ideas 2 python launch_scientist.py --model "claude-3-5-sonnet-20241022" --experiment nanoGPT_lite --num-ideas 2
If you have more than one GPU, use the --parallel
option to parallelize ideas across multiple GPUs.
import openaifrom ai_scientist.perform_review import load_paper, perform_reviewclient = openai.OpenAI()model = "gpt-4o-2024-05-13"# Load paper from PDF file (raw text)paper_txt = load_paper("report.pdf")# Get the review dictionaryreview = perform_review(paper_txt,model,client,num_reflections=5,num_fs_examples=1,num_reviews_ensemble=5,temperature=0.1, )# Inspect review resultsreview["Overall"] # Overall score (1-10)review["Decision"] # 'Accept' or 'Reject'review["Weaknesses"] # List of weaknesses (strings)
To run batch analysis:
cd review_iclr_bench python iclr_analysis.py --num_reviews 500 --batch_size 100 --num_fs_examples 1 --num_reflections 5 --temperature 0.1 --num_reviews_ensemble 5
If there is an area of study you would like The AI Scientist to explore, it is straightforward to create your own templates. In general, follow the structure of the existing templates, which consist of:
experiment.py
— This is the main script where the core content is. It takes an argument --out_dir
, which specifies where it should create the folder and save the relevant information from the run.
plot.py
— This script takes the information from the run
folders and creates plots. The code should be clear and easy to edit.
prompt.json
— Put information about your template here.
seed_ideas.json
— Place example ideas here. You can also try to generate ideas without any examples and then pick the best one or two to put here.
latex/template.tex
— We recommend using our LaTeX folder but be sure to replace the pre-loaded citations with ones that you expect to be more relevant.
The key to making new templates work is matching the base filenames and output JSONs to the existing format; everything else is free to change.
You should also ensure that the template.tex
file is updated to use the correct citation style / base plots for your template.
We welcome community contributions in the form of new templates. While these are not maintained by us, we are delighted to highlight your templates to others. Below, we list community-contributed templates along with links to their pull requests (PRs):
Infectious Disease Modeling (seir
) - PR #137
Image Classification with MobileNetV3 (mobilenetV3
) - PR #141
Sketch RNN (sketch_rnn
) - PR #143
This section is reserved for community contributions. Please submit a pull request to add your template to the list! Please describe the template in the PR description, and also show examples of the generated papers.
We provide three templates, which heavily use code from other repositories, credited below:
NanoGPT Template uses code from NanoGPT and this PR.
2D Diffusion Template uses code from tiny-diffusion, ema-pytorch, and Datasaur.
Grokking Template uses code from Sea-Snell/grokking and danielmamay/grokking.
We would like to thank the developers of the open-source models and packages for their contributions and for making their work available.
If you use The AI Scientist in your research, please cite it as follows:
@article{lu2024aiscientist, title={The {AI} {S}cientist: Towards Fully Automated Open-Ended Scientific Discovery}, author={Lu, Chris and Lu, Cong and Lange, Robert Tjarko and Foerster, Jakob and Clune, Jeff and Ha, David}, journal={arXiv preprint arXiv:2408.06292}, year={2024} }
We recommend reading our paper first for any questions you have on The AI Scientist.
Why am I missing files when running The AI Scientist?
Ensure you have completed all the setup and preparation steps before the main experiment script.
Why has a PDF or a review not been generated?
The AI Scientist finishes an idea with a success rate that depends on the template, the base foundation model, and the complexity of the idea. We advise referring to our main paper. The highest success rates are observed with Claude Sonnet 3.5. Reviews are best done with GPT-4o; all other models have issues with positivity bias or failure to conform to required outputs.
What is the cost of each idea generated?
Typically less than $15 per paper with Claude Sonnet 3.5. We recommend DeepSeek Coder V2 for a much more cost-effective approach. A good place to look for new models is the Aider leaderboard.
How do I change the base conference format associated with the write-ups?
Change the base template.tex
files contained within each template.
How do I run The AI Scientist for different subject fields?
Please refer to the instructions for different templates. In this current iteration, this is restricted to ideas that can be expressed in code. However, lifting this restriction would represent exciting future work! :)
How do I add support for a new foundation model?
You may modify ai_scientist/llm.py
to add support for a new foundation model. We do not advise using any model that is significantly weaker than GPT-4 level for The AI Scientist.
Why do I need to run the baseline runs myself?
These appear as run_0
and should be run per machine you execute The AI Scientist on for accurate run-time comparisons due to hardware differences.
What if I have problems accessing the Semantic Scholar API?
We use the Semantic Scholar API to check ideas for novelty and collect citations for the paper write-up. You may be able to skip these phases if you don't have an API key or the API is slow to access.
We include a community-contributed Docker image that may assist with your containerization efforts in experimental/Dockerfile
.
You can use this image like this:
# Endpoint Scriptdocker run -e OPENAI_API_KEY=$OPENAI_API_KEY -v `pwd`/templates:/app/AI-Scientist/templates <AI_SCIENTIST_IMAGE> --model gpt-4o-2024-05-13 --experiment 2d_diffusion --num-ideas 2
# Interactivedocker run -it -e OPENAI_API_KEY=$OPENAI_API_KEY --entrypoint /bin/bash <AI_SCIENTIST_IMAGE>