multimedia gpt Download - multimedia gpt Source code download

multimedia gpt

Other source code

1.0.0

Download

This repository is not actively maintained as there are recent corporate projects that share our vision, such as TaskMatrix, AutoGPT, and HuggingGPT, that benefit from larger team effort and better management.

Multimedia GPT

Multimedia GPT connects your OpenAI GPT with vision and audio. You can now send images, audio recordings, and pdf documents using your OpenAI API key, and get a response in both text and image formats. We are currently adding support for videos. All is made possible by a prompt manager inspired and built upon Microsoft Visual ChatGPT.

Models

In addition to all of the vision foundation models mentioned in Microsoft Visual ChatGPT, Multimedia GPT supports OpenAI Whisper and OpenAI DALLE! This means that you no longer need your own GPUs for voice recognition and image generation (although you still can!)

The base chat model can be configured as any OpenAI LLM, including ChatGPT and GPT-4. We default to text-davinci-003.

You are welcome to fork this project and add models that's suitable for your own use case. A simple way to do this is through llama_index. You will have to create a new class for your model in model.py, and add a runner method run_<model_name> in multimedia_gpt.py. See run_pdf for an example.

Demo

In this demo, ChatGPT is fed with a recording of a person telling the story of Cinderella.

Installation

# Clone this repository
git clone https://github.com/fengyuli2002/multimedia-gpt
cd multimedia-gpt

# Prepare a conda environment
conda create -n multimedia-gpt python=3.8
conda activate multimedia-gptt
pip install -r requirements.txt

# prepare your private OpenAI key (for Linux / MacOS)
echo "export OPENAI_API_KEY='yourkey'" >> ~/.zshrc
# prepare your private OpenAI key (for Windows)
setx OPENAI_API_KEY “<yourkey>”

# Start Multimedia GPT!
# You can specify the GPU/CPU assignment by "--load", the parameter indicates which foundation models to use and 
# where it will be loaded to. The model and device are separated by '_', different models are separated by ','.
# The available Visual Foundation Models can be found in models.py
# For example, if you want to load ImageCaptioning to cuda:0 and whisper to cpu 
# (whisper runs remotely, so it doesn't matter where it is loaded to)
# You can use: "ImageCaptioning_cuda:0,Whisper_cpu"

# Don't have GPUs? No worry, you can run DALLE and Whisper on cloud using your API key!
python multimedia_gpt.py --load ImageCaptioning_cpu,DALLE_cpu,Whisper_cpu       

# Additionally, you can configure the which OpenAI LLM to use by the "--llm" tag, such as 
python multimedia_gpt.py --llm text-davinci-003  
# The default is gpt-3.5-turbo (ChatGPT).

Plans

This project is an experimental work and will not be deployed to a production environment. Our goal is to explore the power of prompting.

TODOs

Support OpenAI Whisper for speech recognition, added to the default config
Support OpenAI DALLE for image generation, added to the default config
Support OpenAI DALLE for image editing
Add a command-line switch between ChatGPT and GPT-4 backends
Implement a function that extract key frames from a video

Known Problems

DALLE only accepts square .png images — need a work-around
PDFReader (from llama_index) requires a higher version of langchain, which isn't compatible with how visual chatGPT is implemented

Supporters

Expand

Additional Information

Version 1.0.0
Type Other source code
Update Time 2024-11-29
size 5.17MB
From Github

Related Applications

GitHub sgrebnov/cordova plugin background download

2024-11-05
Wa ch ull navra maza navsacha 2 2024 ull ovie Fr e Online On Strea ings

2024-11-03
Wa ch navra maza navsacha 2 2024 ull ovie Online For Fr e Strea ings At Home

2024-11-03
Wa ch the greatest of all time 2024 ull ovie Online For Fr e Strea ings At Home

2024-11-02
wolfs 2024 f llmo ie f lmyz lla dow load ree 7 0p 4 0p a d 10 0p

2024-11-01
The latest version of GPT film and television

2023-10-30

Recommended for You

chat.petals.dev

Other source code

1.0.0
GPT Prompt Templates

Other source code

1.0.0
GPTyped

Other source code

GPTyped 1.0.5
waymo open dataset

Other source code

December 2023 Update
SmartTube

Other source code

24.71 Stable
Sunamu

Other source code

Release 2.2.0
waymo open dataset

Other source code

December 2023 Update
wp functions

Other categories

1.0.0
termwind

Other categories

v2.3.0

Related Information All