self attention cv下载 - self attention cv源代码下载

self attention cv

Ai源码

v2

下载

PyTorch 中计算机视觉应用的自注意力构建块

使用 einsum 和 einops 在 PyTorch 中实现计算机视觉的自注意力机制。专注于计算机视觉自注意力模块。

通过 pip 安装

$ pip install self-attention-cv

如果您没有 GPU，最好在您的环境中预安装 pytorch。要从终端$ pytest运行测试，您可能需要先运行export PYTHONPATH=$PATHONPATH:`pwd` 。

代码示例

多头注意力

 import torch
from self_attention_cv import MultiHeadSelfAttention

model = MultiHeadSelfAttention ( dim = 64 )
x = torch . rand ( 16 , 10 , 64 )  # [batch, tokens, dim]
mask = torch . zeros ( 10 , 10 )  # tokens X tokens
mask [ 5 : 8 , 5 : 8 ] = 1
y = model ( x , mask )

轴向注意力

 import torch
from self_attention_cv import AxialAttentionBlock
model = AxialAttentionBlock ( in_channels = 256 , dim = 64 , heads = 8 )
x = torch . rand ( 1 , 256 , 64 , 64 )  # [batch, tokens, dim, dim]
y = model ( x )

普通 Transformer 编码器

 import torch
from self_attention_cv import TransformerEncoder
model = TransformerEncoder ( dim = 64 , blocks = 6 , heads = 8 )
x = torch . rand ( 16 , 10 , 64 )  # [batch, tokens, dim]
mask = torch . zeros ( 10 , 10 )  # tokens X tokens
mask [ 5 : 8 , 5 : 8 ] = 1
y = model ( x , mask )

用于图像分类的具有/不具有 ResNet50 主干的 Vision Transformer

 import torch
from self_attention_cv import ViT , ResNet50ViT

model1 = ResNet50ViT ( img_dim = 128 , pretrained_resnet = False , 
                        blocks = 6 , num_classes = 10 , 
                        dim_linear_block = 256 , dim = 256 )
# or
model2 = ViT ( img_dim = 256 , in_channels = 3 , patch_dim = 16 , num_classes = 10 , dim = 512 )
x = torch . rand ( 2 , 3 , 256 , 256 )
y = model2 ( x ) # [2,10]

使用 Vision Transformer 编码器重新实现 Unet

 import torch
from self_attention_cv . transunet import TransUnet
a = torch . rand ( 2 , 3 , 128 , 128 )
model = TransUnet ( in_channels = 3 , img_dim = 128 , vit_blocks = 8 ,
vit_dim_linear_mhsa_block = 512 , classes = 5 )
y = model ( a ) # [2, 5, 128, 128]

瓶颈注意力块

 import torch
from self_attention_cv . bottleneck_transformer import BottleneckBlock
inp = torch . rand ( 1 , 512 , 32 , 32 )
bottleneck_block = BottleneckBlock ( in_channels = 512 , fmap_size = ( 32 , 32 ), heads = 4 , out_channels = 1024 , pooling = True )
y = bottleneck_block ( inp )

位置嵌入也可用

一维位置嵌入

 import torch
from self_attention_cv . pos_embeddings import AbsPosEmb1D , RelPosEmb1D

model = AbsPosEmb1D ( tokens = 20 , dim_head = 64 )
# batch heads tokens dim_head
q = torch . rand ( 2 , 3 , 20 , 64 )
y1 = model ( q )

model = RelPosEmb1D ( tokens = 20 , dim_head = 64 , heads = 3 )
q = torch . rand ( 2 , 3 , 20 , 64 )
y2 = model ( q )

2D 位置嵌入

 import torch
from self_attention_cv . pos_embeddings import RelPosEmb2D
dim = 32  # spatial dim of the feat map
model = RelPosEmb2D (
    feat_map_size = ( dim , dim ),
    dim_head = 128 )

q = torch . rand ( 2 , 4 , dim * dim , 128 )
y = model ( q )

致谢

感谢 Alex Rogozhnikov @arogozhnikov 提供了很棒的 einops 包。为了重新实现，我研究了 Phil Wang @lucidrains 的许多存储库并借用了代码。通过研究他的代码，我成功地掌握了自我注意力，发现了论文中从未提及的 nlp 内容，并从他干净的编码风格中学习。

被引用为

 @article{adaloglou2021transformer,
    title   = "Transformers in Computer Vision",
    author  = "Adaloglou, Nikolas",
    journal = "https://theaisummer.com/",
    year    = "2021",
    howpublished = {https://github.com/The-AI-Summer/self-attention-cv},
  }

参考

Vaswani, A.、Shazeer, N.、Parmar, N.、Uszkoreit, J.、Jones, L.、Gomez, AN, ... & Polosukhin, I. (2017)。您所需要的就是关注。 arXiv 预印本 arXiv：1706.03762。
Wang, H.、Zhu, Y.、Green, B.、Adam, H.、Yuille, A. 和 Chen, LC（2020 年 8 月）。 Axial-deeplab：用于全景分割的独立轴向注意力。欧洲计算机视觉会议（第 108-126 页）。施普林格、查姆.
Srinivas, A.、Lin, TY、Parmar, N.、Shlens, J.、Abbeel, P. 和 Vaswani, A. (2021)。视觉识别的瓶颈变压器。 arXiv 预印本 arXiv：2101.11605。
Dosovitskiy, A.、Beyer, L.、Kolesnikov, A.、Weissenborn, D.、Zhai, X.、Unterthiner, T., ... & Houlsby, N. (2020)。一张图像相当于 16x16 个单词：用于大规模图像识别的 Transformer。 arXiv 预印本 arXiv：2010.11929。
Ramachandran, P.、Parmar, N.、Vaswani, A.、Bello, I.、Levskaya, A. 和 Shlens, J. (2019)。视觉模型中的独立自注意力。 arXiv 预印本 arXiv：1906.05909。
陈J.、卢Y.、余Q.、罗X.、阿德利E.、王Y.、……和周Y.（2021）。 Transunet：Transformers 为医学图像分割提供了强大的编码器。 arXiv 预印本 arXiv：2102.04306。
王 S.、李 B.、卡布萨 M.、方 H. 和马 H. (2020)。 Linformer：具有线性复杂度的自注意力。 arXiv 预印本 arXiv：2006.04768。
Bertasius, G.、Wang, H. 和 Torresani, L. (2021)。时空注意力是视频理解所需要的一切吗？ arXiv 预印本 arXiv：2102.05095。
Shaw, P.、Uszkoreit, J. 和 Vaswani, A. (2018)。具有相对位置表示的自注意力。 arXiv 预印本 arXiv：1803.02155。