TaskMatrix下載 - TaskMatrix原始碼下載

TaskMatrix

其他源碼

1.0.0

下載

任務矩陣

TaskMatrix連接 ChatGPT 和一系列 Visual Foundation 模型，以實現在聊天期間傳送和接收映像。

請參閱我們的論文：Visual ChatGPT：使用 Visual Foundation 模型進行對話、繪圖和編輯

更新：

現在TaskMatrix 支援GroundingDINO 和segment-anything！感謝@jordddan的努力。對於影像編輯情況，首先使用GroundingDINO定位給定文字引導的邊界框，然後使用segment-anything產生相關掩模，最後使用穩定擴散修復基於掩模編輯圖像。
- 首先，執行python visual_chatgpt.py --load "Text2Box_cuda:0,Segmenting_cuda:0,Inpainting_cuda:0,ImageCaptioning_cuda:0"
- 然後，說find xxx in the image或segment xxx in the image 。 xxx是一個物件。 TaskMatrix將傳回偵測或分割結果！
現在TaskMatrix可以支援中文了！感謝@Wang-Xiaodong1899的努力。
我們在TaskMatrix中提出了模板的想法！
- 範本是一個預先定義的執行流程，可協助 ChatGPT 組裝涉及多個基礎模型的複雜任務。
- 範本包含人類確定的複雜任務的經驗解決方案。
- 一個模板可以呼叫多個基礎模型，甚至可以建立一個新的 ChatGPT 會話
- 要定義模板，只需新增一個具有屬性template_model = True類
感謝@ShengmingYin和@thebestannie在InfinityOutPainting類別中提供了模板範例（請參閱下面的 gif）
- 首先，執行python visual_chatgpt.py --load "Inpainting_cuda:0,ImageCaptioning_cuda:0,VisualQuestionAnswering_cuda:0"
- 其次，假設extend the image to 2048x1024 ！
- 只需建立InfinityOutPainting模板，TaskMatrix 就可以透過與現有ImageCaptioning 、 Inpainting和VisualQuestionAnswering基礎模型合作將影像無縫擴展至任何尺寸，而無需額外培訓。
TaskMatrix需要社區的努力！我們渴望您的貢獻來添加新的有趣的功能！

洞察力和目標：

一方面， ChatGPT（或 LLM）充當通用接口，提供對廣泛主題的廣泛且多樣化的理解。另一方面，基礎模型透過提供特定領域的深入知識來充當領域專家。透過利用一般知識和深層知識，我們的目標是建構一個能夠處理各種任務的人工智慧。

示範

系統架構

快速入門

 # clone the repo
git clone https://github.com/microsoft/TaskMatrix.git

# Go to directory
cd visual-chatgpt

# create a new environment
conda create -n visgpt python=3.8

# activate the new environment
conda activate visgpt

#  prepare the basic environments
pip install -r requirements.txt
pip install  git+https://github.com/IDEA-Research/GroundingDINO.git
pip install  git+https://github.com/facebookresearch/segment-anything.git

# prepare your private OpenAI key (for Linux)
export OPENAI_API_KEY={Your_Private_Openai_Key}

# prepare your private OpenAI key (for Windows)
set OPENAI_API_KEY={Your_Private_Openai_Key}

# Start TaskMatrix !
# You can specify the GPU/CPU assignment by "--load", the parameter indicates which 
# Visual Foundation Model to use and where it will be loaded to
# The model and device are separated by underline '_', the different models are separated by comma ','
# The available Visual Foundation Models can be found in the following table
# For example, if you want to load ImageCaptioning to cpu and Text2Image to cuda:0
# You can use: "ImageCaptioning_cpu,Text2Image_cuda:0"

# Advice for CPU Users
python visual_chatgpt.py --load ImageCaptioning_cpu,Text2Image_cpu

# Advice for 1 Tesla T4 15GB  (Google Colab)                       
python visual_chatgpt.py --load "ImageCaptioning_cuda:0,Text2Image_cuda:0"
                                
# Advice for 4 Tesla V100 32GB                            
python visual_chatgpt.py --load "Text2Box_cuda:0,Segmenting_cuda:0,
    Inpainting_cuda:0,ImageCaptioning_cuda:0,
    Text2Image_cuda:1,Image2Canny_cpu,CannyText2Image_cuda:1,
    Image2Depth_cpu,DepthText2Image_cuda:1,VisualQuestionAnswering_cuda:2,
    InstructPix2Pix_cuda:2,Image2Scribble_cpu,ScribbleText2Image_cuda:2,
    SegText2Image_cuda:2,Image2Pose_cpu,PoseText2Image_cuda:2,
    Image2Hed_cpu,HedText2Image_cuda:3,Image2Normal_cpu,
    NormalText2Image_cuda:3,Image2Line_cpu,LineText2Image_cuda:3"

GPU 記憶體使用情況

這裡我們列出了每個 Visual Foundation 模型的 GPU 記憶體使用情況，您可以指定您喜歡哪一個：

基礎模型	GPU 記憶體 (MB)
圖像編輯	3981
指導Pix2Pix	2827
文字轉圖像	3385
圖片字幕	1209
圖像2Canny	0
CannyText2Image	3531
影像2線	0
行文字轉影像	3529
圖像2Hed	0
HedText2Image	3529
圖像2塗鴉	0
塗鴉文字轉圖像	3531
影像2姿勢	0
姿勢文字2圖像	3529
影像2段	919
分段文字2影像	3529
影像2深度	0
深度文字2影像	3531
影像2法線	0
普通文字轉圖像	3529
視覺問答	1495

致謝

我們很欣賞以下項目的開源：

擁抱臉 LangChain 穩定擴散 ControlNet InstructPix2Pix CLIPSeg BLIP

聯絡資訊

如需使用 TaskMatrix 的協助或問題，請提交 GitHub 問題。

如需其他溝通，請聯絡 Chenfei WU ([email protected]) 或 Nan DUAN ([email protected])。

商標聲明

商標本項目可能包含項目、產品或服務的商標或標誌。 Microsoft 商標或標誌的授權使用須遵守且必須遵循 Microsoft 的商標和品牌指南。在此項目的修改版本中使用 Microsoft 商標或標誌不得混淆或暗示 Microsoft 贊助。任何對第三方商標或標誌的使用均須遵守這些第三方的政策。

免責聲明

本 Repo 中推薦的模型只是一個範例，用於探索任務自動化概念的科學研究以及與 Visual ChatGPT 上發表的論文進行基準測試：使用 Visual Foundation 模型進行對話、繪圖和編輯。使用者可以根據自己的研究需求更換本Repo中的模型。使用本Repo建議的模型時，需要分別遵守這些模型的License。對於您使用此儲存庫而導致的任何第三方權利侵權，Microsoft 不承擔任何責任。使用者同意對與本儲存庫引起的任何索賠相關的所有損害、費用和律師費進行辯護、賠償並使 Microsoft 免受損害。如果有人認為此 Repo 侵犯了您的權利，請通知專案擁有者電子郵件。

展開

附加信息

版本 1.0.0
類型其他源碼
更新時間 2024-11-28
大小 15.05MB
來自於 Github

相關應用

waymo open dataset

2024-11-18
SmartTube

2024-12-14
Sunamu

2024-12-14
MySchedule.py

2024-12-15
viptools for eslam

2024-12-15
VITAident

2024-12-15

爲您推薦

chat.petals.dev

其他源碼

1.0.0
GPT Prompt Templates

其他源碼

1.0.0
GPTyped

其他源碼

GPTyped 1.0.5
waymo open dataset

其他源碼

December 2023 Update
SmartTube

其他源碼

24.71 Stable
Sunamu

其他源碼

Release 2.2.0
waymo open dataset

其他源碼

December 2023 Update
wp functions

其他類別

1.0.0
termwind

其他類別

v2.3.0

相關資訊全部