[網站] [arxiv] [PDF]
搖籃框架賦予了新生的基礎模型,可以通過人類使用相同的統一接口執行複雜的計算機任務,即,將屏幕截圖(作為輸入和鍵盤和鼠標操作)作為輸出。
單擊上面的任何視頻縮略圖以在YouTube上觀看。
我們目前提供了對OpenAI和Claude的API的訪問權限。請在存儲庫的根部創建一個.env
文件以存儲鍵(其中之一就足夠了)。
示例.env
文件包含私人信息:
OA_OPENAI_KEY = "abc123abc123abc123abc123abc123ab"
RF_CLAUDE_AK = "abc123abc123abc123abc123abc123ab" # Access Key for Claude
RF_CLAUDE_SK = "123abc123abc123abc123abc123abc12" # Secret Access Key for Claude
AZ_OPENAI_KEY = "123abc123abc123abc123abc123abc12"
AZ_BASE_URL = "https://abc123.openai.azure.com/"
RF_CLAUDE_AK = "abc123abc123abc123abc123abc123ab"
RF_CLAUDE_SK = "123abc123abc123abc123abc123abc12"
IDE_NAME = "Code"
OA_OPENAI_KEY是OpenAI API鍵。您可以從Openai獲得它。
AZ_OPENAI_KEY是Azure OpenAI API鍵。您可以從Azure Portal獲得它。
OA_CLAUDE_KEY是擬人化的Claude API密鑰。您可以從擬人化中獲得它。
rf_claude_ak和rf_claude_sk是Claude API的AWS RESTFUL API密鑰和秘密密鑰。
IDE_NAME是指存儲庫代碼運行的IDE環境,例如PyCharm
或Code
(VSCODE)。它主要用於在IDE和目標環境之間自動切換。
請設置您的Python環境,並安裝所需的依賴項為:
# Clone the repository
git clone https://github.com/BAAI-Agents/Cradle.git
cd Cradle
# Create a new conda environment
conda create --name cradle-dev python=3.10
conda activate cradle-dev
pip install -r requirements.txt
1. Option 1
# Download best-matching version of specific model for your spaCy installation
python -m spacy download en_core_web_lg
or
# pip install .tar.gz archive or .whl from path or URL
pip install https://github.com/explosion/spacy-models/releases/download/en_core_web_lg-3.7.1/en_core_web_lg-3.7.1.tar.gz
2. Option 2
# Copy this url https://github.com/explosion/spacy-models/releases/download/en_core_web_lg-3.7.1/en_core_web_lg-3.7.1.tar.gz
# Paste it in the browser and download the file to res/spacy/data
cd res/spacy/data
pip install en_core_web_lg-3.7.1.tar.gz
由於每個遊戲和軟件之間存在巨大差異,我們為下面的每個遊戲提供了特定的設置。
由於某些用戶可能希望將我們的框架應用於新遊戲,因此本節主要展示搖籃的核心目錄和組織結構。我們將在“”與遷移到新遊戲有關的模塊中突出顯示,並在以後提供詳細的說明。
Cradle
├── cache # Cache the GroundingDino model and the bert-base-uncased model
├── conf # The configuration files for the environment and the llm model
│ ├── env_config_dealers.json
│ ├── env_config_rdr2_main_storyline.json
│ ├── env_config_rdr2_open_ended_mission.json
│ ├── env_config_skylines.json
│ ├── env_config_stardew_cultivation.json
│ ├── env_config_stardew_farm_clearup.json
│ ├── env_config_stardew_shopping.json
│ ├── openai_config.json
│ ├── claude_config.json
│ ├── restful_claude_config.json
│ └── ...
├── deps # The dependencies for the Cradle framework, ignore this folder
├── docs # The documentation for the Cradle framework, ignore this folder
├── res # The resources for the Cradle framework
│ ├── models # Ignore this folder
│ ├── tool # Subfinder for RDR2
│ ├── [game or software] # The resources for game, exmpale: rdr2, dealers, skylines, stardew, outlook, chrome, capcut, meitu, feishu
│ │ ├── prompts # The prompts for the game
│ │ │ └── templates
│ │ │ ├── action_planning.prompt
│ │ │ ├── information_gathering.prompt
│ │ │ ├── self_reflection.prompt
│ │ │ └── task_inference.prompt
│ │ ├── skills # The skills json for the game, it will be generated automatically
│ │ ├── icons # The icons difficult for GPT-4 to recognize in the game can be replaced with text for better recognition using an icon replacer
│ │ └── saves # Save files in the game
│ └── ...
├── requirements.txt # The requirements for the Cradle framework
├── runner.py # The main entry for the Cradle framework
├── cradle # Cradle's core modules
│ ├── config # The configuration for the Cradle framework
│ ├── environment # The environment for the Cradle framework
│ │ ├── [game or software] # The environment for the game, exmpale: rdr2, dealers, skylines, stardew, outlook, chrome, capcut, meitu, feishu
│ │ │ ├── __init__.py # The initialization file for the environment
│ │ │ ├── atomic_skills # Atomic skills in the game. Users should customise them to suit the needs of the game or software, e.g. character movement
│ │ │ ├── composite_skills # Combination skills for atomic skills in games or software
│ │ │ ├── skill_registry.py # The skill registry for the game. Will register all atomic skills and composite skills into the registry.
│ │ │ └── ui_control.py # The UI control for the game. Define functions to pause the game and switch to the game window
│ │ └── ...
│ ├── gameio # Interfaces that directly wrap the skill registry and ui control in the environment
│ ├── log # The log for the Cradle framework
│ ├── memory # The memory for the Cradle framework
│ ├── module # Currently there is only the skill execution module. Later will migrate action planning, self-reflection and other modules from planner and provider
│ ├── planner # The planner for the Cradle framework. Unified interface for action planning, self-reflection and other modules. This module will be deleted later and will be moved to the module module.
│ ├── runner # The logical flow of execution for each game and software. All game and software processes will then be unified into a single runner
│ ├── utils # Defines some helper functions such as save json and load json
│ └── provider # The provider for the Cradle framework. We have semantically decomposed most of the execution flow in the runner into providers
│ ├── augment # Methods for image augmentation
│ ├── llm # Call for the LLM model, e.g. OpenAI's GPT-4o, Claude, etc.
│ ├── module # The module for the Cradle framework. e.g., action planning, self-reflection and other modules. It will be migrated to the cradle/module later.
│ ├── object_detect # Methods for object detection
│ ├── process # Methods for pre-processing and post-processing for action planning, self-reflection and other modules
│ ├── video # Methods for video processing
│ ├── others # Methods for other operations, e.g., save and load coordinates for skylines
│ ├── circle_detector.py # The circle detector for the rdr2
│ ├── icon_replacer.py # Methods for replacing icons with text
│ ├── sam_provider.py # Segment anything for software
│ └── ...
└── ...
由於每個遊戲的設置及其兼容的操作系統都是不同的,因此搖籃不能簡單地替換一個遊戲名稱即可遷移到新遊戲。我們建議專門考慮每個遊戲。例如,獨立的AAA遊戲RDR2需要實時戰鬥,因此我們需要暫停遊戲才能等待GPT-4O的響應,然後取消遊戲以執行操作。 Stardew也有同樣的問題。其他遊戲(例如經銷商的生活2和城市):天際線沒有實時要求,因此它們不需要暫停。如果新遊戲類似於後者,我們建議複製城市:Skylines的實現,並遵循其實現路徑來創建相應的模塊。儘管每個遊戲都可能有很大差異,但我們的搖籃框架仍然可以實現遊戲的統一改編。假設新遊戲的名稱是新遊戲,則可以找到特定的遷移管道遷移到新遊戲指南。
如果您發現我們的工作有用,請考慮引用我們!
@article{tan2024cradle,
title={Cradle: Empowering Foundation Agents towards General Computer Control},
author={Weihao Tan and Wentao Zhang and Xinrun Xu and Haochong Xia and Ziluo Ding and Boyu Li and Bohan Zhou and Junpeng Yue and Jiechuan Jiang and Yewen Li and Ruyi An and Molei Qin and Chuqiao Zong and Longtao Zheng and Yujie Wu and Xiaoqiang Chai and Yifei Bi and Tianbao Xie and Pengjie Gu and Xiyun Li and Ceyao Zhang and Long Tian and Chaojie Wang and Xinrun Wang and Börje F. Karlsson and Bo An and Shuicheng Yan and Zongqing Lu},
journal={arXiv preprint arXiv:2403.03186},
year={2024}
}