Awesome Multimodal Chatbot Download - Awesome Multimodal Chatbot Source code download

English

中文(简体) 中文(繁体) 한국어 日本語 English Português Español Русский العربية Indonesia Deutsch Français ภาษาไทย

Home>Programming related>AI Source Code

Awesome Multimodal Chatbot

AI Source Code

1.0.0

Download

Awesome-Multimodal-Chatbot

Awesome Multimodal Assistant is a curated list of multimodal chatbots/conversational assistants that utilize various modes of interaction, such as text, speech, images, and videos, to provide a seamless and versatile user experience. It is designed to assist users in performing various tasks, from simple information retrieval to complex multimedia reasoning.

Multimodal Instruction Tuning

MultiInstruct: Improving Multi-Modal Zero-Shot Learning via Instruction Tuning

arXiv 2022/12 [paper]
GPT-4

arXiv 2023/03 [paper] [blog]
Visual Instruction Tuning

arXiv 2023/04 [paper] [code] [project page] [demo]
MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language Models

arXiv 2023/04 [paper] [code] [project page] [demo]
mPLUG-Owl: Modularization Empowers Large Language Models with Multimodality

arXiv 2023/04 [paper] [code] [demo]
LLaMA-Adapter V2: Parameter-Efficient Visual Instruction Model

arXiv 2023/04 [paper] [code] [demo]
Video-LLaMA: An Instruction-Finetuned Visual Language Model for Video Understanding

[code]
LMEye: An Interactive Perception Network for Large Language Models
arXiv 2023/05 [paper] [code]
MultiModal-GPT: A Vision and Language Model for Dialogue with Humans

arXiv 2023/05 [paper] [code] [demo]
X-LLM: Bootstrapping Advanced Large Language Models by Treating Multi-Modalities as Foreign Languages

arXiv 2023/05 [paper] [code] [project page]
Otter: A Multi-Modal Model with In-Context Instruction Tuning

arXiv 2023/05 [paper] [code] [demo]
InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning

arXiv 2023/05 [paper] [code]
InternGPT: Solving Vision-Centric Tasks by Interacting with ChatGPT Beyond Language

arXiv 2023/05 [paper] [code] [demo]
VisionLLM: Large Language Model is also an Open-Ended Decoder for Vision-Centric Tasks

arXiv 2023/05 [paper] [code]
Cheap and Quick: Efficient Vision-Language Instruction Tuning for Large Language Models
arXiv 2023/05 [paper] [code] [project page]
EmbodiedGPT: Vision-Language Pre-Training via Embodied Chain of Thought

arXiv 2023/05 [paper] [code] [project page]
DetGPT: Detect What You Need via Reasoning

arXiv 2023/05 [paper] [code] [project page]
PathAsst: Redefining Pathology through Generative Foundation AI Assistant for Pathology

arXiv 2023/05 [paper] [code]
ChatBridge: Bridging Modalities with Large Language Model as a Language Catalyst

arXiv 2023/05 [paper] [code] [project page]
Video-ChatGPT: Towards Detailed Video Understanding via Large Vision and Language Models

arXiv 2023/06 [paper] [code]
LAMM: Language-Assisted Multi-Modal Instruction-Tuning Dataset, Framework, and Benchmark

arXiv 2023/06 [paper]
Accountable Textual-Visual Chat Learns to Reject Human Instructions in Image Re-creation

arXiv 2023/06 [paper] [project page]
VALLEY: VIDEO ASSISTANT WITH LARGE LANGUAGE MODEL ENHANCED ABILITY

arXiv 2023/06 [paper] [code]

LLM-Based Modularized Frameworks

Visual ChatGPT: Talking, Drawing and Editing with Visual Foundation Models

arXiv 2023/03 [paper] [code] [demo]
ViperGPT: Visual Inference via Python Execution for Reasoning

arXiv 2023/03 [paper] [code] [project page]
TaskMatrix.AI: Completing Tasks by Connecting Foundation Models with Millions of APIs

arXiv 2023/03 [paper] [code]
Chatgpt asks, blip-2 answers: Automatic questioning towards enriched visual descriptions

arXiv 2023/03 [paper] [code]
MM-REACT: Prompting ChatGPT for Multimodal Reasoning and Action

arXiv 2023/03 [paper] [code] [project page] [demo]
Hugginggpt: Solving ai tasks with chatgpt and its friends in huggingface

arXiv 2023/03 [paper] [code] [demo]
VLog: Video as a Long Document

[code] [demo]
Video ChatCaptioner: Towards Enriched Spatiotemporal Descriptions

arXiv 2023/04 [paper] [code]
ChatVideo: A Tracklet-centric Multimodal and Versatile Video Understanding System

arXiv 2023/04 [paper] [project page]
VideoChat: Chat-Centric Video Understanding

arXiv 2023/05 [paper] [code] [demo]

Expand

Additional Information

Version 1.0.0
Type AI Source Code
Update Time 2024-12-13
size 3.06KB
From Github

Related Applications

awesome citygml

2024-11-13
awesome generative ai guide

2024-11-05
GitHub sgrebnov/cordova plugin background download

2024-11-05
awesome swift

2024-11-03
Awesome Devil Game

2023-04-16
The Awesome Ad

2022-08-08

Recommended for You

chat.petals.dev

Other source code

1.0.0
GPT Prompt Templates

Other source code

1.0.0
GPTyped

Other source code

GPTyped 1.0.5
node telegram bot api

AI Source Code

v0.50.0
typebot.io

AI Source Code

v3.1.2
python wechaty getting started

AI Source Code

1.0.0
waymo open dataset

Other source code

December 2023 Update
termwind

Other categories

v2.3.0
wp functions

Other categories

1.0.0

Related Information All