[网站] [arxiv] [PDF]
摇篮框架赋予了新生的基础模型,可以通过人类使用相同的统一接口执行复杂的计算机任务,即,将屏幕截图(作为输入和键盘和鼠标操作)作为输出。
单击上面的任何视频缩略图以在YouTube上观看。
我们目前提供了对OpenAI和Claude的API的访问权限。请在存储库的根部创建一个.env
文件以存储键(其中之一就足够了)。
示例.env
文件包含私人信息:
OA_OPENAI_KEY = "abc123abc123abc123abc123abc123ab"
RF_CLAUDE_AK = "abc123abc123abc123abc123abc123ab" # Access Key for Claude
RF_CLAUDE_SK = "123abc123abc123abc123abc123abc12" # Secret Access Key for Claude
AZ_OPENAI_KEY = "123abc123abc123abc123abc123abc12"
AZ_BASE_URL = "https://abc123.openai.azure.com/"
RF_CLAUDE_AK = "abc123abc123abc123abc123abc123ab"
RF_CLAUDE_SK = "123abc123abc123abc123abc123abc12"
IDE_NAME = "Code"
OA_OPENAI_KEY是OpenAI API键。您可以从Openai获得它。
AZ_OPENAI_KEY是Azure OpenAI API键。您可以从Azure Portal获得它。
OA_CLAUDE_KEY是拟人化的Claude API密钥。您可以从拟人化中获得它。
rf_claude_ak和rf_claude_sk是Claude API的AWS RESTFUL API密钥和秘密密钥。
IDE_NAME是指存储库代码运行的IDE环境,例如PyCharm
或Code
(VSCODE)。它主要用于在IDE和目标环境之间自动切换。
请设置您的Python环境,并安装所需的依赖项为:
# Clone the repository
git clone https://github.com/BAAI-Agents/Cradle.git
cd Cradle
# Create a new conda environment
conda create --name cradle-dev python=3.10
conda activate cradle-dev
pip install -r requirements.txt
1. Option 1
# Download best-matching version of specific model for your spaCy installation
python -m spacy download en_core_web_lg
or
# pip install .tar.gz archive or .whl from path or URL
pip install https://github.com/explosion/spacy-models/releases/download/en_core_web_lg-3.7.1/en_core_web_lg-3.7.1.tar.gz
2. Option 2
# Copy this url https://github.com/explosion/spacy-models/releases/download/en_core_web_lg-3.7.1/en_core_web_lg-3.7.1.tar.gz
# Paste it in the browser and download the file to res/spacy/data
cd res/spacy/data
pip install en_core_web_lg-3.7.1.tar.gz
由于每个游戏和软件之间存在巨大差异,我们为下面的每个游戏提供了特定的设置。
由于某些用户可能希望将我们的框架应用于新游戏,因此本节主要展示摇篮的核心目录和组织结构。我们将在“”与迁移到新游戏有关的模块中突出显示,并在以后提供详细的说明。
Cradle
├── cache # Cache the GroundingDino model and the bert-base-uncased model
├── conf # The configuration files for the environment and the llm model
│ ├── env_config_dealers.json
│ ├── env_config_rdr2_main_storyline.json
│ ├── env_config_rdr2_open_ended_mission.json
│ ├── env_config_skylines.json
│ ├── env_config_stardew_cultivation.json
│ ├── env_config_stardew_farm_clearup.json
│ ├── env_config_stardew_shopping.json
│ ├── openai_config.json
│ ├── claude_config.json
│ ├── restful_claude_config.json
│ └── ...
├── deps # The dependencies for the Cradle framework, ignore this folder
├── docs # The documentation for the Cradle framework, ignore this folder
├── res # The resources for the Cradle framework
│ ├── models # Ignore this folder
│ ├── tool # Subfinder for RDR2
│ ├── [game or software] # The resources for game, exmpale: rdr2, dealers, skylines, stardew, outlook, chrome, capcut, meitu, feishu
│ │ ├── prompts # The prompts for the game
│ │ │ └── templates
│ │ │ ├── action_planning.prompt
│ │ │ ├── information_gathering.prompt
│ │ │ ├── self_reflection.prompt
│ │ │ └── task_inference.prompt
│ │ ├── skills # The skills json for the game, it will be generated automatically
│ │ ├── icons # The icons difficult for GPT-4 to recognize in the game can be replaced with text for better recognition using an icon replacer
│ │ └── saves # Save files in the game
│ └── ...
├── requirements.txt # The requirements for the Cradle framework
├── runner.py # The main entry for the Cradle framework
├── cradle # Cradle's core modules
│ ├── config # The configuration for the Cradle framework
│ ├── environment # The environment for the Cradle framework
│ │ ├── [game or software] # The environment for the game, exmpale: rdr2, dealers, skylines, stardew, outlook, chrome, capcut, meitu, feishu
│ │ │ ├── __init__.py # The initialization file for the environment
│ │ │ ├── atomic_skills # Atomic skills in the game. Users should customise them to suit the needs of the game or software, e.g. character movement
│ │ │ ├── composite_skills # Combination skills for atomic skills in games or software
│ │ │ ├── skill_registry.py # The skill registry for the game. Will register all atomic skills and composite skills into the registry.
│ │ │ └── ui_control.py # The UI control for the game. Define functions to pause the game and switch to the game window
│ │ └── ...
│ ├── gameio # Interfaces that directly wrap the skill registry and ui control in the environment
│ ├── log # The log for the Cradle framework
│ ├── memory # The memory for the Cradle framework
│ ├── module # Currently there is only the skill execution module. Later will migrate action planning, self-reflection and other modules from planner and provider
│ ├── planner # The planner for the Cradle framework. Unified interface for action planning, self-reflection and other modules. This module will be deleted later and will be moved to the module module.
│ ├── runner # The logical flow of execution for each game and software. All game and software processes will then be unified into a single runner
│ ├── utils # Defines some helper functions such as save json and load json
│ └── provider # The provider for the Cradle framework. We have semantically decomposed most of the execution flow in the runner into providers
│ ├── augment # Methods for image augmentation
│ ├── llm # Call for the LLM model, e.g. OpenAI's GPT-4o, Claude, etc.
│ ├── module # The module for the Cradle framework. e.g., action planning, self-reflection and other modules. It will be migrated to the cradle/module later.
│ ├── object_detect # Methods for object detection
│ ├── process # Methods for pre-processing and post-processing for action planning, self-reflection and other modules
│ ├── video # Methods for video processing
│ ├── others # Methods for other operations, e.g., save and load coordinates for skylines
│ ├── circle_detector.py # The circle detector for the rdr2
│ ├── icon_replacer.py # Methods for replacing icons with text
│ ├── sam_provider.py # Segment anything for software
│ └── ...
└── ...
由于每个游戏的设置及其兼容的操作系统都是不同的,因此摇篮不能简单地替换一个游戏名称即可迁移到新游戏。我们建议专门考虑每个游戏。例如,独立的AAA游戏RDR2需要实时战斗,因此我们需要暂停游戏才能等待GPT-4O的响应,然后取消游戏以执行操作。 Stardew也有同样的问题。其他游戏(例如经销商的生活2和城市):天际线没有实时要求,因此它们不需要暂停。如果新游戏类似于后者,我们建议复制城市:Skylines的实现,并遵循其实现路径来创建相应的模块。尽管每个游戏都可能有很大差异,但我们的摇篮框架仍然可以实现游戏的统一改编。假设新游戏的名称是新游戏,则可以找到特定的迁移管道迁移到新游戏指南。
如果您发现我们的工作有用,请考虑引用我们!
@article{tan2024cradle,
title={Cradle: Empowering Foundation Agents towards General Computer Control},
author={Weihao Tan and Wentao Zhang and Xinrun Xu and Haochong Xia and Ziluo Ding and Boyu Li and Bohan Zhou and Junpeng Yue and Jiechuan Jiang and Yewen Li and Ruyi An and Molei Qin and Chuqiao Zong and Longtao Zheng and Yujie Wu and Xiaoqiang Chai and Yifei Bi and Tianbao Xie and Pengjie Gu and Xiyun Li and Ceyao Zhang and Long Tian and Chaojie Wang and Xinrun Wang and Börje F. Karlsson and Bo An and Shuicheng Yan and Zongqing Lu},
journal={arXiv preprint arXiv:2403.03186},
year={2024}
}