Qwen VL下载 - Qwen VL源代码下载

Qwen VL

其他源码

1.0.0

下载

多模态

Qwen-VL

我个人对“Qwen-VL：具有多功能能力的前沿大型视觉语言模型”模型的实现，他们还没有发布模型代码......有关更多详细信息，请参阅全文。模型架构基本上如论文中所示：img -> vit -> 具有可学习查询嵌入的多模态融合层，然后通过投影层传递 -> 进入 Qwen LLM。

安装

pip3 install qwen

用法

 # Importing the necessary libraries
import torch
from qwen import Qwen

# Creating an instance of the Qwen model
model = Qwen ()

# Generating random text and image tensors
text = torch . randint ( 0 , 20000 , ( 1 , 1024 ))
img = torch . randn ( 1 , 3 , 256 , 256 )

# Passing the image and text tensors through the model
out = model ( img , text )  # (1, 1024, 20000)

托多

位置感知视觉语言适配器，压缩图像特征。 Singer 层交叉注意模块随机初始化 => 一组可训练嵌入作为查询向量 + 来自视觉编码器的图像特征作为交叉注意操作的关键 => 输出：将视觉特征序列压缩到 256 的固定长度，集成二维绝对位置编码进入交叉注意机制查询密钥对 => 长度为 256 的压缩特征序列 => 馈入解码器 llm
Bounding Boxes，对于任何给定的精确边界框，在 [0, 1000] 范围内应用归一过程并转换为字符串格式 (Xtope, Ytople)(Xottomright, Ybottomright) -> 字符串被标记为文本，而不是需要位置词汇。检测字符串和常规文本字符串是两个特殊标记，被添加到边界框字符串的开头和结尾。 + 引入了另一个特殊标记 ( 和 ) 的 sed。

引文

请使用以下内容引用本作品：

 @article { bai2023qwen ,
  title = { Qwen-VL: A Frontier Large Vision-Language Model with Versatile Abilities } ,
  author = { Bai, Jinze and Bai, Shuai and Yang, Shusheng and Wang, Shijie and Tan, Sinan and Wang, Peng and Lin, Junyang and Zhou, Chang and Zhou, Jingren } ,
  journal = { arXiv preprint arXiv:2308.12966 } ,
  year = { 2023 } ,
  url = { https://doi.org/10.48550/arXiv.2308.12966 }
}