self attention cv
v2
使用 einsum 和 einops 在 PyTorch 中實現電腦視覺的自註意力機制。專注於電腦視覺自註意力模組。
$ pip install self-attention-cv
如果您沒有 GPU,最好在您的環境中預先安裝 pytorch。若要從終端機$ pytest
執行測試,您可能需要先執行export PYTHONPATH=$PATHONPATH:`pwd`
。
import torch
from self_attention_cv import MultiHeadSelfAttention
model = MultiHeadSelfAttention ( dim = 64 )
x = torch . rand ( 16 , 10 , 64 ) # [batch, tokens, dim]
mask = torch . zeros ( 10 , 10 ) # tokens X tokens
mask [ 5 : 8 , 5 : 8 ] = 1
y = model ( x , mask )
import torch
from self_attention_cv import AxialAttentionBlock
model = AxialAttentionBlock ( in_channels = 256 , dim = 64 , heads = 8 )
x = torch . rand ( 1 , 256 , 64 , 64 ) # [batch, tokens, dim, dim]
y = model ( x )
import torch
from self_attention_cv import TransformerEncoder
model = TransformerEncoder ( dim = 64 , blocks = 6 , heads = 8 )
x = torch . rand ( 16 , 10 , 64 ) # [batch, tokens, dim]
mask = torch . zeros ( 10 , 10 ) # tokens X tokens
mask [ 5 : 8 , 5 : 8 ] = 1
y = model ( x , mask )
import torch
from self_attention_cv import ViT , ResNet50ViT
model1 = ResNet50ViT ( img_dim = 128 , pretrained_resnet = False ,
blocks = 6 , num_classes = 10 ,
dim_linear_block = 256 , dim = 256 )
# or
model2 = ViT ( img_dim = 256 , in_channels = 3 , patch_dim = 16 , num_classes = 10 , dim = 512 )
x = torch . rand ( 2 , 3 , 256 , 256 )
y = model2 ( x ) # [2,10]
import torch
from self_attention_cv . transunet import TransUnet
a = torch . rand ( 2 , 3 , 128 , 128 )
model = TransUnet ( in_channels = 3 , img_dim = 128 , vit_blocks = 8 ,
vit_dim_linear_mhsa_block = 512 , classes = 5 )
y = model ( a ) # [2, 5, 128, 128]
import torch
from self_attention_cv . bottleneck_transformer import BottleneckBlock
inp = torch . rand ( 1 , 512 , 32 , 32 )
bottleneck_block = BottleneckBlock ( in_channels = 512 , fmap_size = ( 32 , 32 ), heads = 4 , out_channels = 1024 , pooling = True )
y = bottleneck_block ( inp )
import torch
from self_attention_cv . pos_embeddings import AbsPosEmb1D , RelPosEmb1D
model = AbsPosEmb1D ( tokens = 20 , dim_head = 64 )
# batch heads tokens dim_head
q = torch . rand ( 2 , 3 , 20 , 64 )
y1 = model ( q )
model = RelPosEmb1D ( tokens = 20 , dim_head = 64 , heads = 3 )
q = torch . rand ( 2 , 3 , 20 , 64 )
y2 = model ( q )
import torch
from self_attention_cv . pos_embeddings import RelPosEmb2D
dim = 32 # spatial dim of the feat map
model = RelPosEmb2D (
feat_map_size = ( dim , dim ),
dim_head = 128 )
q = torch . rand ( 2 , 4 , dim * dim , 128 )
y = model ( q )
感謝 Alex Rogozhnikov @arogozhnikov 提供了很棒的 einops 包。為了重新實現,我研究了 Phil Wang @lucidrains 的許多儲存庫並借用了程式碼。透過研究他的程式碼,我成功地掌握了自我注意力,發現了論文中從未提及的 nlp 內容,並從他乾淨的編碼風格中學習。
@article{adaloglou2021transformer,
title = "Transformers in Computer Vision",
author = "Adaloglou, Nikolas",
journal = "https://theaisummer.com/",
year = "2021",
howpublished = {https://github.com/The-AI-Summer/self-attention-cv},
}
如果您真的喜歡這個儲存庫並發現它有用,請考慮 (★) 為其加註星標,以便它可以覆蓋更廣泛的志趣相投的受眾。我們將不勝感激:)!