self attention cv
v2
使用 einsum 和 einops 在 PyTorch 中实现计算机视觉的自注意力机制。专注于计算机视觉自注意力模块。
$ pip install self-attention-cv
如果您没有 GPU,最好在您的环境中预安装 pytorch。要从终端$ pytest
运行测试,您可能需要先运行export PYTHONPATH=$PATHONPATH:`pwd`
。
import torch
from self_attention_cv import MultiHeadSelfAttention
model = MultiHeadSelfAttention ( dim = 64 )
x = torch . rand ( 16 , 10 , 64 ) # [batch, tokens, dim]
mask = torch . zeros ( 10 , 10 ) # tokens X tokens
mask [ 5 : 8 , 5 : 8 ] = 1
y = model ( x , mask )
import torch
from self_attention_cv import AxialAttentionBlock
model = AxialAttentionBlock ( in_channels = 256 , dim = 64 , heads = 8 )
x = torch . rand ( 1 , 256 , 64 , 64 ) # [batch, tokens, dim, dim]
y = model ( x )
import torch
from self_attention_cv import TransformerEncoder
model = TransformerEncoder ( dim = 64 , blocks = 6 , heads = 8 )
x = torch . rand ( 16 , 10 , 64 ) # [batch, tokens, dim]
mask = torch . zeros ( 10 , 10 ) # tokens X tokens
mask [ 5 : 8 , 5 : 8 ] = 1
y = model ( x , mask )
import torch
from self_attention_cv import ViT , ResNet50ViT
model1 = ResNet50ViT ( img_dim = 128 , pretrained_resnet = False ,
blocks = 6 , num_classes = 10 ,
dim_linear_block = 256 , dim = 256 )
# or
model2 = ViT ( img_dim = 256 , in_channels = 3 , patch_dim = 16 , num_classes = 10 , dim = 512 )
x = torch . rand ( 2 , 3 , 256 , 256 )
y = model2 ( x ) # [2,10]
import torch
from self_attention_cv . transunet import TransUnet
a = torch . rand ( 2 , 3 , 128 , 128 )
model = TransUnet ( in_channels = 3 , img_dim = 128 , vit_blocks = 8 ,
vit_dim_linear_mhsa_block = 512 , classes = 5 )
y = model ( a ) # [2, 5, 128, 128]
import torch
from self_attention_cv . bottleneck_transformer import BottleneckBlock
inp = torch . rand ( 1 , 512 , 32 , 32 )
bottleneck_block = BottleneckBlock ( in_channels = 512 , fmap_size = ( 32 , 32 ), heads = 4 , out_channels = 1024 , pooling = True )
y = bottleneck_block ( inp )
import torch
from self_attention_cv . pos_embeddings import AbsPosEmb1D , RelPosEmb1D
model = AbsPosEmb1D ( tokens = 20 , dim_head = 64 )
# batch heads tokens dim_head
q = torch . rand ( 2 , 3 , 20 , 64 )
y1 = model ( q )
model = RelPosEmb1D ( tokens = 20 , dim_head = 64 , heads = 3 )
q = torch . rand ( 2 , 3 , 20 , 64 )
y2 = model ( q )
import torch
from self_attention_cv . pos_embeddings import RelPosEmb2D
dim = 32 # spatial dim of the feat map
model = RelPosEmb2D (
feat_map_size = ( dim , dim ),
dim_head = 128 )
q = torch . rand ( 2 , 4 , dim * dim , 128 )
y = model ( q )
感谢 Alex Rogozhnikov @arogozhnikov 提供了很棒的 einops 包。为了重新实现,我研究了 Phil Wang @lucidrains 的许多存储库并借用了代码。通过研究他的代码,我成功地掌握了自我注意力,发现了论文中从未提及的 nlp 内容,并从他干净的编码风格中学习。
@article{adaloglou2021transformer,
title = "Transformers in Computer Vision",
author = "Adaloglou, Nikolas",
journal = "https://theaisummer.com/",
year = "2021",
howpublished = {https://github.com/The-AI-Summer/self-attention-cv},
}
如果您真的喜欢这个存储库并发现它有用,请考虑 (★) 为其加注星标,以便它可以覆盖更广泛的志趣相投的受众。我们将不胜感激:)!