OmniParser下載 - OmniParser原始碼下載

OmniParser

其他源碼

下載

OmniParser：基於純視覺的 GUI 代理程式的螢幕解析工具

？ [專案頁] [部落格文章] [模型]

OmniParser是一種將使用者介面螢幕截圖解析為結構化且易於理解的元素的綜合方法，它顯著增強了 GPT-4V 生成可以準確地紮根於介面相應區域的操作的能力。

訊息

[2024/10] OmniParser 是 Huggingface 模型中心上排名第一的趨勢模型（從 2024 年 10 月 29 日開始）。
[2024/10] 歡迎在huggingface space 上觀看我們的示範！（請關注 OmniParser + Claude 計算機使用）
[2024/10] 互動區域偵測模型和圖示功能描述模型同時發布！擁抱臉模型
[2024/09] OmniParser 在 Windows Agent Arena 上取得最佳效能！

安裝

安裝環境：

 conda create -n "omni" python==3.12conda activateomnipip install -rrequirements.txt

然後在https://huggingface.co/microsoft/OmniParser中下載模型ckpts文件，並將它們放在weights/下，預設資料夾結構為：weights/icon_detect、weights/icon_caption_florence、weights/icon_caption_blip2。

最後，將 safetensor 轉換為 .pt 檔案。

 python 權重/convert_safetensor_to_pt.py

範例：

我們在 demo.ipynb 中整理了一些簡單的範例。

混音器演示

要運行 gradio 演示，只需運行：

蟒蛇gradio_demo.py

型號重量許可證

對於huggingface模型中心上的模型檢查點，請注意icon_detect模型受AGPL許可，因為它是從原始yolo模型繼承的許可。 icon_caption_blip2 和 icon_caption_florence 已獲得 MIT 許可。請參考各型號資料夾內的LICENSE檔案：https://huggingface.co/microsoft/OmniParser。

？引文

我們的技術報告可以在這裡找到。如果您發現我們的工作有用，請考慮引用我們的工作：

@misc{lu2024omniparserpurevisionbased,
      title={OmniParser for Pure Vision Based GUI Agent}, 
      author={Yadong Lu and Jianwei Yang and Yelong Shen and Ahmed Awadallah},
      year={2024},
      eprint={2408.00203},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2408.00203}, 
}

展開

附加信息

版本
類型其他源碼
更新時間 2024-12-20
大小 50MB
來自於 Github

相關應用

waymo open dataset

2024-11-18
SmartTube

2024-12-14
Sunamu

2024-12-14
MySchedule.py

2024-12-15
viptools for eslam

2024-12-15
VITAident

2024-12-15

爲您推薦

chat.petals.dev

其他源碼

1.0.0
GPT Prompt Templates

其他源碼

1.0.0
GPTyped

其他源碼

GPTyped 1.0.5
waymo open dataset

其他源碼

December 2023 Update
SmartTube

其他源碼

24.71 Stable
Sunamu

其他源碼

Release 2.2.0
waymo open dataset

其他源碼

December 2023 Update
wp functions

其他類別

1.0.0
termwind

其他類別

v2.3.0

相關資訊全部