项目页面| arxiv |视频
Vikrant Dewangan* 1 , Tushar Choudhary* 1 , Shivam Chandhok* 2 , Shubham Priyadarshan 1 , Anushka Jain 1 , Arun K. Singh 3 , Siddharth Srivastava 4 , Krishna Murthy Jatavallabhula
1 International Institute of Information Technology Hyderabad, 2 University of British Columbia, 3 University of Tartu 4 TensorTour Inc 5 MIT-CSAIL
*表示同等的贡献,
$^匕首$ 表示平等的建议
ICRA 2024
我们介绍了Talk2Bev,这是一种大型视觉模型(LVLM)界面,用于鸟类视图(BEV)地图,通常在自动驾驶中使用。
While existing perception systems for autonomous driving scenarios have largely focused on a pre-defined (closed) set of object categories and driving scenarios, Talk2BEV eliminates the need for BEV- specific training, relying instead on performant pre-trained LVLMs. This enables a single system to cater to a variety of autonomous driving tasks encompassing visual and spatial reasoning, predicting the intents of traffic actors, and decision- making based on visual cues.
We extensively evaluate Talk2BEV on a large number of scene understanding tasks that rely on both the ability to interpret freefrom natural language queries, and in grounding these queries to the visual context embedded into the language-enhanced BEV map. To enable further research in LVLMs for autonomous driving scenarios, we develop and release Talk2BEV-Bench, a benchmark encom- passing 1000 human-annotated BEV scenarios, with more than 20,000 questions and ground-truth responses from the NuScenes dataset.
请下载Nuscenes V1.0-TrainVal数据集。 Our dataset consists of 2 parts - Talk2BEV-Base and Talk2BEV-Captions, consisting of base (crops, perspective images, bev area centroids) and crop captions respectively.
我们提供了2个链接,以下提供了下面提供的TAKE2BEV数据集( Talk2Bev-Mini (仅字幕)和Talk2Bev-Full )。该数据集托管在Google Drive上。请下载数据集并将文件提取到data
文件夹。
姓名 | 根据 | 字幕 | 长椅 | 关联 |
---|---|---|---|---|
talk2bev- mini | ✓ | ✗ | ✗ | 关联 |
Talk2Bev-完整 | ✗ | ✗ | ✗ | 托多 |
如果要从头开始生成数据集,请在此处关注该过程。每个数据部分的格式以格式描述。
对Talk2BEV的评估通过2种方法进行 - MCQ(来自Talk2BEV Bench)和空间操作员进行评估。我们使用GPT-4进行评估。请按照GPT-4中的说明进行操作,并在OS Env中初始化API密钥和组织。
ORGANIZATION= < your-organization >
API_KEY= < your-api-key >
要获得MCQ的准确性,请运行以下命令:
cd evaluation
python eval_mcq.py
这将产生MCQ的准确性。
要获取距离错误,请为MCQ提供以下命令:
cd evaluation
python eval_spops.py
我们还允许与BEV自由形式对话。请按照Click2Chat中的说明与BEV聊天。
要发布