Anhui Engineering University, Nanyang Technological University and Lehigh University jointly launched an eye-catching multi-modal large model - TinyGPT-V. The notable feature of this model is its amazing cost-effectiveness: its performance is comparable to models with tens of billions of parameters, but it only requires 24G GPU to complete training, which greatly lowers the resource threshold. This is undoubtedly a major benefit for individuals and institutions who want to conduct large-scale model research and applications with limited resources. The architecture of TinyGPT-V mainly consists of the large language model Phi-2, visual encoder and linear projection layer. Its multi-angle performance evaluation results also demonstrate its strong strength in multiple visual language tasks.
Researchers from Anhui Engineering University, Nanyang Technological University and Lehigh University have open sourced a large multi-modal model - TinyGPT-V. Its performance is comparable to models with tens of billions of parameters, and training only requires a 24G GPU to complete. TinyGPT-V is mainly composed of three major blocks: large language model Phi-2, visual encoder and linear projection layer. Researchers conducted a multi-angle evaluation of the performance of TinyGPT-V, showing its strong performance on multiple visual language tasks.
The open source of TinyGPT-V provides new ideas and possibilities for the research and application of multi-modal large models, and also marks significant progress in lowering the threshold for large model training. In the future, we can expect more similar high-efficiency, low-cost large models to appear, further promoting the popularization and development of artificial intelligence technology. Its efficient performance in resource-constrained environments has brought good news to both academia and industry.