self attention cv下載 - self attention cv原始碼下載

self attention cv

Ai源碼

v2

下載

PyTorch 中電腦視覺應用的自註意力構建塊

使用 einsum 和 einops 在 PyTorch 中實現電腦視覺的自註意力機制。專注於電腦視覺自註意力模組。

透過 pip 安裝

$ pip install self-attention-cv

如果您沒有 GPU，最好在您的環境中預先安裝 pytorch。若要從終端機$ pytest執行測試，您可能需要先執行export PYTHONPATH=$PATHONPATH:`pwd` 。

程式碼範例

多頭注意力

 import torch
from self_attention_cv import MultiHeadSelfAttention

model = MultiHeadSelfAttention ( dim = 64 )
x = torch . rand ( 16 , 10 , 64 )  # [batch, tokens, dim]
mask = torch . zeros ( 10 , 10 )  # tokens X tokens
mask [ 5 : 8 , 5 : 8 ] = 1
y = model ( x , mask )

軸向注意力

 import torch
from self_attention_cv import AxialAttentionBlock
model = AxialAttentionBlock ( in_channels = 256 , dim = 64 , heads = 8 )
x = torch . rand ( 1 , 256 , 64 , 64 )  # [batch, tokens, dim, dim]
y = model ( x )

一般 Transformer 編碼器

 import torch
from self_attention_cv import TransformerEncoder
model = TransformerEncoder ( dim = 64 , blocks = 6 , heads = 8 )
x = torch . rand ( 16 , 10 , 64 )  # [batch, tokens, dim]
mask = torch . zeros ( 10 , 10 )  # tokens X tokens
mask [ 5 : 8 , 5 : 8 ] = 1
y = model ( x , mask )

用於影像分類的具有/不具有 ResNet50 主幹的 Vision Transformer

 import torch
from self_attention_cv import ViT , ResNet50ViT

model1 = ResNet50ViT ( img_dim = 128 , pretrained_resnet = False , 
                        blocks = 6 , num_classes = 10 , 
                        dim_linear_block = 256 , dim = 256 )
# or
model2 = ViT ( img_dim = 256 , in_channels = 3 , patch_dim = 16 , num_classes = 10 , dim = 512 )
x = torch . rand ( 2 , 3 , 256 , 256 )
y = model2 ( x ) # [2,10]

使用 Vision Transformer 編碼器重新實作 Unet

 import torch
from self_attention_cv . transunet import TransUnet
a = torch . rand ( 2 , 3 , 128 , 128 )
model = TransUnet ( in_channels = 3 , img_dim = 128 , vit_blocks = 8 ,
vit_dim_linear_mhsa_block = 512 , classes = 5 )
y = model ( a ) # [2, 5, 128, 128]

瓶頸注意力塊

 import torch
from self_attention_cv . bottleneck_transformer import BottleneckBlock
inp = torch . rand ( 1 , 512 , 32 , 32 )
bottleneck_block = BottleneckBlock ( in_channels = 512 , fmap_size = ( 32 , 32 ), heads = 4 , out_channels = 1024 , pooling = True )
y = bottleneck_block ( inp )

位置嵌入也可用

一維位置嵌入

 import torch
from self_attention_cv . pos_embeddings import AbsPosEmb1D , RelPosEmb1D

model = AbsPosEmb1D ( tokens = 20 , dim_head = 64 )
# batch heads tokens dim_head
q = torch . rand ( 2 , 3 , 20 , 64 )
y1 = model ( q )

model = RelPosEmb1D ( tokens = 20 , dim_head = 64 , heads = 3 )
q = torch . rand ( 2 , 3 , 20 , 64 )
y2 = model ( q )

2D 位置嵌入

 import torch
from self_attention_cv . pos_embeddings import RelPosEmb2D
dim = 32  # spatial dim of the feat map
model = RelPosEmb2D (
    feat_map_size = ( dim , dim ),
    dim_head = 128 )

q = torch . rand ( 2 , 4 , dim * dim , 128 )
y = model ( q )

致謝

感謝 Alex Rogozhnikov @arogozhnikov 提供了很棒的 einops 包。為了重新實現，我研究了 Phil Wang @lucidrains 的許多儲存庫並借用了程式碼。透過研究他的程式碼，我成功地掌握了自我注意力，發現了論文中從未提及的 nlp 內容，並從他乾淨的編碼風格中學習。

被引用為

 @article{adaloglou2021transformer,
    title   = "Transformers in Computer Vision",
    author  = "Adaloglou, Nikolas",
    journal = "https://theaisummer.com/",
    year    = "2021",
    howpublished = {https://github.com/The-AI-Summer/self-attention-cv},
  }

參考

Vaswani, A.、Shazeer, N.、Parmar, N.、Uszkoreit, J.、Jones, L.、Gomez, AN, ... & Polosukhin, I. (2017)。您所需要的就是關注。 arXiv 預印本 arXiv：1706.03762。
Wang, H.、Zhu, Y.、Green, B.、Adam, H.、Yuille, A. 與 Chen, LC（2020 年 8 月）。 Axial-deeplab：用於全景分割的獨立軸向注意力。歐洲電腦視覺會議（第 108-126 頁）。施普林格、查姆.
Srinivas, A.、Lin, TY、Parmar, N.、Shlens, J.、Abbeel, P. 與 Vaswani, A. (2021)。視覺辨識的瓶頸變壓器。 arXiv 預印本 arXiv：2101.11605。
Dosovitskiy, A.、Beyer, L.、Kolesnikov, A.、Weissenborn, D.、Zhai, X.、Unterthiner, T., ... & Houlsby, N. (2020)。一張圖像相當於 16x16 個字：用於大規模影像辨識的 Transformer。 arXiv 預印本 arXiv：2010.11929。
Ramachandran, P.、Parmar, N.、Vaswani, A.、Bello, I.、Levskaya, A. 與 Shlens, J. (2019)。視覺模型中的獨立自註意力。 arXiv 預印本 arXiv：1906.05909。
陳J.、盧Y.、餘Q.、羅X.、阿德利E.、王Y.、…和周Y.（2021）。 Transunet：Transformers 為醫學影像分割提供了強大的編碼器。 arXiv 預印本 arXiv：2102.04306。
王 S.、李 B.、卡布薩 M.、方 H. 和馬 H. (2020)。 Linformer：具有線性複雜度的自註意力。 arXiv 預印本 arXiv：2006.04768。
Bertasius, G.、Wang, H. 與 Torresani, L. (2021)。時空注意力是視訊理解所需要的一切嗎？ arXiv 預印本 arXiv：2102.05095。
Shaw, P.、Uszkoreit, J. 與 Vaswani, A. (2018)。具有相對位置表示的自註意力。 arXiv 預印本 arXiv：1803.02155。