multimedia gpt
1.0.0
多媒體 GPT 將 OpenAI GPT 與視覺和音訊連接。現在您可以使用 OpenAI API 金鑰發送圖像、錄音和 pdf 文檔,並獲得文字和圖像格式的回應。我們目前正在添加對影片的支援。這一切都是由 Microsoft Visual ChatGPT 啟發並建立的提示管理器實現的。
除了 Microsoft Visual ChatGPT 中提到的所有視覺基礎模型之外,多媒體 GPT 還支援 OpenAI Whisper 和 OpenAI DALLE!這意味著您不再需要自己的 GPU 來進行語音辨識和影像生成(儘管您仍然可以!)
基本聊天模型可以配置為任何 OpenAI LLM ,包括 ChatGPT 和 GPT-4。我們預設為text-davinci-003
。
歡迎您分叉此項目並添加適合您自己的用例的模型。一個簡單的方法是透過 llama_index。您必須在model.py
中為模型建立一個新類,並在multimedia_gpt.py
中新增運行程式方法run_<model_name>
。有關範例,請參閱run_pdf
。
在此示範中,ChatGPT 接收一個人講述灰姑娘故事的錄音。
# Clone this repository
git clone https://github.com/fengyuli2002/multimedia-gpt
cd multimedia-gpt
# Prepare a conda environment
conda create -n multimedia-gpt python=3.8
conda activate multimedia-gptt
pip install -r requirements.txt
# prepare your private OpenAI key (for Linux / MacOS)
echo " export OPENAI_API_KEY='yourkey' " >> ~ /.zshrc
# prepare your private OpenAI key (for Windows)
setx OPENAI_API_KEY “ < yourkey > ”
# Start Multimedia GPT!
# You can specify the GPU/CPU assignment by "--load", the parameter indicates which foundation models to use and
# where it will be loaded to. The model and device are separated by '_', different models are separated by ','.
# The available Visual Foundation Models can be found in models.py
# For example, if you want to load ImageCaptioning to cuda:0 and whisper to cpu
# (whisper runs remotely, so it doesn't matter where it is loaded to)
# You can use: "ImageCaptioning_cuda:0,Whisper_cpu"
# Don't have GPUs? No worry, you can run DALLE and Whisper on cloud using your API key!
python multimedia_gpt.py --load ImageCaptioning_cpu,DALLE_cpu,Whisper_cpu
# Additionally, you can configure the which OpenAI LLM to use by the "--llm" tag, such as
python multimedia_gpt.py --llm text-davinci-003
# The default is gpt-3.5-turbo (ChatGPT).
該專案是一個實驗性工作,不會部署到生產環境。我們的目標是探索提示的力量。