multimedia gpt下載 - multimedia gpt原始碼下載

multimedia gpt

其他源碼

1.0.0

下載

該儲存庫並未積極維護，因為最近有一些與我們有共同願景的企業項目，例如 TaskMatrix、AutoGPT 和 HuggingGPT，它們受益於更大的團隊努力和更好的管理。

多媒體GPT

多媒體 GPT 將 OpenAI GPT 與視覺和音訊連接。現在您可以使用 OpenAI API 金鑰發送圖像、錄音和 pdf 文檔，並獲得文字和圖像格式的回應。我們目前正在添加對影片的支援。這一切都是由 Microsoft Visual ChatGPT 啟發並建立的提示管理器實現的。

型號

除了 Microsoft Visual ChatGPT 中提到的所有視覺基礎模型之外，多媒體 GPT 還支援 OpenAI Whisper 和 OpenAI DALLE！這意味著您不再需要自己的 GPU 來進行語音辨識和影像生成（儘管您仍然可以！）

基本聊天模型可以配置為任何 OpenAI LLM ，包括 ChatGPT 和 GPT-4。我們預設為text-davinci-003 。

歡迎您分叉此項目並添加適合您自己的用例的模型。一個簡單的方法是透過 llama_index。您必須在model.py中為模型建立一個新類，並在multimedia_gpt.py中新增運行程式方法run_<model_name> 。有關範例，請參閱run_pdf 。

示範

在此示範中，ChatGPT 接收一個人講述灰姑娘故事的錄音。

安裝

 # Clone this repository
git clone https://github.com/fengyuli2002/multimedia-gpt
cd multimedia-gpt

# Prepare a conda environment
conda create -n multimedia-gpt python=3.8
conda activate multimedia-gptt
pip install -r requirements.txt

# prepare your private OpenAI key (for Linux / MacOS)
echo " export OPENAI_API_KEY='yourkey' " >> ~ /.zshrc
# prepare your private OpenAI key (for Windows)
setx OPENAI_API_KEY “ < yourkey > ”

# Start Multimedia GPT!
# You can specify the GPU/CPU assignment by "--load", the parameter indicates which foundation models to use and 
# where it will be loaded to. The model and device are separated by '_', different models are separated by ','.
# The available Visual Foundation Models can be found in models.py
# For example, if you want to load ImageCaptioning to cuda:0 and whisper to cpu 
# (whisper runs remotely, so it doesn't matter where it is loaded to)
# You can use: "ImageCaptioning_cuda:0,Whisper_cpu"

# Don't have GPUs? No worry, you can run DALLE and Whisper on cloud using your API key!
python multimedia_gpt.py --load ImageCaptioning_cpu,DALLE_cpu,Whisper_cpu       

# Additionally, you can configure the which OpenAI LLM to use by the "--llm" tag, such as 
python multimedia_gpt.py --llm text-davinci-003  
# The default is gpt-3.5-turbo (ChatGPT).