Python library with Neural Networks for Image
Segmentation based on PyTorch.
The main features of this library are:
Visit Read The Docs Project Page or read the following README to know more about Segmentation Models Pytorch (SMP for short) library
The segmentation model is just a PyTorch torch.nn.Module
, which can be created as easy as:
import segmentation_models_pytorch as smp
model = smp.Unet(
encoder_name="resnet34", # choose encoder, e.g. mobilenet_v2 or efficientnet-b7
encoder_weights="imagenet", # use `imagenet` pre-trained weights for encoder initialization
in_channels=1, # model input channels (1 for gray-scale images, 3 for RGB, etc.)
classes=3, # model output channels (number of classes in your dataset)
)
All encoders have pretrained weights. Preparing your data the same way as during weights pre-training may give you better results (higher metric score and faster convergence). It is not necessary in case you train the whole model, not only the decoder.
from segmentation_models_pytorch.encoders import get_preprocessing_fn
preprocess_input = get_preprocessing_fn('resnet18', pretrained='imagenet')
Congratulations! You are done! Now you can train your model with your favorite framework!
The following is a list of supported encoders in the SMP. Select the appropriate family of encoders and click to expand the table and select a specific encoder and its pre-trained weights (encoder_name
and encoder_weights
parameters).
Encoder | Weights | Params, M |
---|---|---|
resnet18 | imagenet / ssl / swsl | 11M |
resnet34 | imagenet | 21M |
resnet50 | imagenet / ssl / swsl | 23M |
resnet101 | imagenet | 42M |
resnet152 | imagenet | 58M |
Encoder | Weights | Params, M |
---|---|---|
resnext50_32x4d | imagenet / ssl / swsl | 22M |
resnext101_32x4d | ssl / swsl | 42M |
resnext101_32x8d | imagenet / instagram / ssl / swsl | 86M |
resnext101_32x16d | instagram / ssl / swsl | 191M |
resnext101_32x32d | 466M | |
resnext101_32x48d | 826M |
Encoder | Weights | Params, M |
---|---|---|
timm-resnest14d | imagenet | 8M |
timm-resnest26d | imagenet | 15M |
timm-resnest50d | imagenet | 25M |
timm-resnest101e | imagenet | 46M |
timm-resnest200e | imagenet | 68M |
timm-resnest269e | imagenet | 108M |
timm-resnest50d_4s2x40d | imagenet | 28M |
timm-resnest50d_1s4x24d | imagenet | 23M |
Encoder | Weights | Params, M |
---|---|---|
timm-res2net50_26w_4s | imagenet | 23M |
timm-res2net101_26w_4s | imagenet | 43M |
timm-res2net50_26w_6s | imagenet | 35M |
timm-res2net50_26w_8s | imagenet | 46M |
timm-res2net50_48w_2s | imagenet | 23M |
timm-res2net50_14w_8s | imagenet | 23M |
timm-res2next50 | imagenet | 22M |
Encoder | Weights | Params, M |
---|---|---|
timm-regnetx_002 | imagenet | 2M |
timm-regnetx_004 | imagenet | 4M |
timm-regnetx_006 | imagenet | 5M |
timm-regnetx_008 | imagenet | 6M |
timm-regnetx_016 | imagenet | 8M |
timm-regnetx_032 | imagenet | 14M |
timm-regnetx_040 | imagenet | 20M |
timm-regnetx_064 | imagenet | 24M |
timm-regnetx_080 | imagenet | 37M |
timm-regnetx_120 | imagenet | 43M |
timm-regnetx_160 | imagenet | 52M |
timm-regnetx_320 | imagenet | 105M |
timm-regnety_002 | imagenet | 2M |
timm-regnety_004 | imagenet | 3M |
timm-regnety_006 | imagenet | 5M |
timm-regnety_008 | imagenet | 5M |
timm-regnety_016 | imagenet | 10M |
timm-regnety_032 | imagenet | 17M |
timm-regnety_040 | imagenet | 19M |
timm-regnety_064 | imagenet | 29M |
timm-regnety_080 | imagenet | 37M |
timm-regnety_120 | imagenet | 49M |
timm-regnety_160 | imagenet | 80M |
timm-regnety_320 | imagenet | 141M |
Encoder | Weights | Params, M |
---|---|---|
timm-gernet_s | imagenet | 6M |
timm-gernet_m | imagenet | 18M |
timm-gernet_l | imagenet | 28M |
Encoder | Weights | Params, M |
---|---|---|
senet154 | imagenet | 113M |
se_resnet50 | imagenet | 26M |
se_resnet101 | imagenet | 47M |
se_resnet152 | imagenet | 64M |
se_resnext50_32x4d | imagenet | 25M |
se_resnext101_32x4d | imagenet | 46M |
Encoder | Weights | Params, M |
---|---|---|
timm-skresnet18 | imagenet | 11M |
timm-skresnet34 | imagenet | 21M |
timm-skresnext50_32x4d | imagenet | 25M |
Encoder | Weights | Params, M |
---|---|---|
densenet121 | imagenet | 6M |
densenet169 | imagenet | 12M |
densenet201 | imagenet | 18M |
densenet161 | imagenet | 26M |
Encoder | Weights | Params, M |
---|---|---|
inceptionresnetv2 | imagenet / imagenet+background | 54M |
inceptionv4 | imagenet / imagenet+background | 41M |
xception | imagenet | 22M |
Encoder | Weights | Params, M |
---|---|---|
efficientnet-b0 | imagenet | 4M |
efficientnet-b1 | imagenet | 6M |
efficientnet-b2 | imagenet | 7M |
efficientnet-b3 | imagenet | 10M |
efficientnet-b4 | imagenet | 17M |
efficientnet-b5 | imagenet | 28M |
efficientnet-b6 | imagenet | 40M |
efficientnet-b7 | imagenet | 63M |
timm-efficientnet-b0 | imagenet / advprop / noisy-student | 4M |
timm-efficientnet-b1 | imagenet / advprop / noisy-student | 6M |
timm-efficientnet-b2 | imagenet / advprop / noisy-student | 7M |
timm-efficientnet-b3 | imagenet / advprop / noisy-student | 10M |
timm-efficientnet-b4 | imagenet / advprop / noisy-student | 17M |
timm-efficientnet-b5 | imagenet / advprop / noisy-student | 28M |
timm-efficientnet-b6 | imagenet / advprop / noisy-student | 40M |
timm-efficientnet-b7 | imagenet / advprop / noisy-student | 63M |
timm-efficientnet-b8 | imagenet / advprop | 84M |
timm-efficientnet-l2 | noisy-student | 474M |
timm-efficientnet-lite0 | imagenet | 4M |
timm-efficientnet-lite1 | imagenet | 5M |
timm-efficientnet-lite2 | imagenet | 6M |
timm-efficientnet-lite3 | imagenet | 8M |
timm-efficientnet-lite4 | imagenet | 13M |
Encoder | Weights | Params, M |
---|---|---|
mobilenet_v2 | imagenet | 2M |
timm-mobilenetv3_large_075 | imagenet | 1.78M |
timm-mobilenetv3_large_100 | imagenet | 2.97M |
timm-mobilenetv3_large_minimal_100 | imagenet | 1.41M |
timm-mobilenetv3_small_075 | imagenet | 0.57M |
timm-mobilenetv3_small_100 | imagenet | 0.93M |
timm-mobilenetv3_small_minimal_100 | imagenet | 0.43M |
Encoder | Weights | Params, M |
---|---|---|
dpn68 | imagenet | 11M |
dpn68b | imagenet+5k | 11M |
dpn92 | imagenet+5k | 34M |
dpn98 | imagenet | 58M |
dpn107 | imagenet+5k | 84M |
dpn131 | imagenet | 76M |
Encoder | Weights | Params, M |
---|---|---|
vgg11 | imagenet | 9M |
vgg11_bn | imagenet | 9M |
vgg13 | imagenet | 9M |
vgg13_bn | imagenet | 9M |
vgg16 | imagenet | 14M |
vgg16_bn | imagenet | 14M |
vgg19 | imagenet | 20M |
vgg19_bn | imagenet | 20M |
Backbone from SegFormer pretrained on Imagenet! Can be used with other decoders from package, you can combine Mix Vision Transformer with Unet, FPN and others!
Limitations:
Encoder | Weights | Params, M |
---|---|---|
mit_b0 | imagenet | 3M |
mit_b1 | imagenet | 13M |
mit_b2 | imagenet | 24M |
mit_b3 | imagenet | 44M |
mit_b4 | imagenet | 60M |
mit_b5 | imagenet | 81M |
Apple's "sub-one-ms" Backbone pretrained on Imagenet! Can be used with all decoders.
Note: In the official github repo the s0 variant has additional num_conv_branches, leading to more params than s1.
Encoder | Weights | Params, M |
---|---|---|
mobileone_s0 | imagenet | 4.6M |
mobileone_s1 | imagenet | 4.0M |
mobileone_s2 | imagenet | 6.5M |
mobileone_s3 | imagenet | 8.8M |
mobileone_s4 | imagenet | 13.6M |
* ssl
, swsl
- semi-supervised and weakly-supervised learning on ImageNet (repo).
docs
Pytorch Image Models (a.k.a. timm) has a lot of pretrained models and interface which allows using these models as encoders in smp, however, not all models are supported
features_only
functionality implemented that is required for encoderTotal number of supported encoders: 549
model.encoder
- pretrained backbone to extract features of different spatial resolutionmodel.decoder
- depends on models architecture (Unet
/Linknet
/PSPNet
/FPN
)model.segmentation_head
- last block to produce required number of mask channels (include also optional upsampling and activation)model.classification_head
- optional block which create classification head on top of encodermodel.forward(x)
- sequentially pass x
through model`s encoder, decoder and segmentation head (and classification head if specified)Input channels parameter allows you to create models, which process tensors with arbitrary number of channels.
If you use pretrained weights from imagenet - weights of first convolution will be reused. For
1-channel case it would be a sum of weights of first convolution layer, otherwise channels would be
populated with weights like new_weight[:, i] = pretrained_weight[:, i % 3]
and than scaled with new_weight * 3 / new_in_channels
.
model = smp.FPN('resnet34', in_channels=1)
mask = model(torch.ones([1, 1, 64, 64]))
All models support aux_params
parameters, which is default set to None
.
If aux_params = None
then classification auxiliary output is not created, else
model produce not only mask
, but also label
output with shape NC
.
Classification head consists of GlobalPooling->Dropout(optional)->Linear->Activation(optional) layers, which can be
configured by aux_params
as follows:
aux_params=dict(
pooling='avg', # one of 'avg', 'max'
dropout=0.5, # dropout ratio, default is None
activation='sigmoid', # activation function, default is None
classes=4, # define number of output labels
)
model = smp.Unet('resnet34', classes=4, aux_params=aux_params)
mask, label = model(x)
Depth parameter specify a number of downsampling operations in encoder, so you can make
your model lighter if specify smaller depth
.
model = smp.Unet('resnet34', encoder_depth=4)
PyPI version:
$ pip install segmentation-models-pytorch
Latest version from source:
$ pip install git+https://github.com/qubvel/segmentation_models.pytorch
Segmentation Models
package is widely used in the image segmentation competitions.
Here you can find competitions, names of the winners and links to their solutions.
make install_dev # create .venv, install SMP in dev mode
make fixup # Ruff for formatting and lint checks
make table # generate a table with encoders and print to stdout
@misc{Iakubovskii:2019,
Author = {Pavel Iakubovskii},
Title = {Segmentation Models Pytorch},
Year = {2019},
Publisher = {GitHub},
Journal = {GitHub repository},
Howpublished = {url{https://github.com/qubvel/segmentation_models.pytorch}}
}
The project is distributed under MIT License