sam2下载 - sam2源码下载

sam2

其他源码

下载

SAM 2：分割图像和视频中的任何内容

Meta、FAIR 的人工智能

Nikhila Ravi、Valentin Gabeur、胡元廷、胡荣航、Chaitanya Ryali、马腾宇、Haitham Khedr、Roman Rädle、Chloe Rolland、Laura Gustafson、Eric Mintun、潘俊廷、Kalyan Vasudev Alwala、Nicolas Carion、吴朝元、罗斯·吉尔希克、皮奥特·多拉尔、克里斯托夫·费希滕霍费尔

[ Paper ] [ Project ] [ Demo ] [ Dataset ] [ Blog ] [ BibTeX ]

SAM 2架构

Segment Anything Model 2 (SAM 2)是解决图像和视频中快速视觉分割问题的基础模型。我们将 SAM 扩展到视频，将图像视为具有单帧的视频。该模型设计是一个简单的变压器架构，具有用于实时视频处理的流存储器。我们构建了一个模型在环数据引擎，它通过用户交互改进模型和数据，以收集我们的 SA-V 数据集，这是迄今为止最大的视频分割数据集。根据我们的数据进行训练的 SAM 2 在广泛的任务和视觉领域中提供了强大的性能。

SA-V数据集

安装

使用前需先安装 SAM 2。该代码需要python>=3.10 ，以及torch>=2.3.1和torchvision>=0.18.1 。请按照此处的说明安装 PyTorch 和 TorchVision 依赖项。您可以使用以下命令在 GPU 计算机上安装 SAM 2：

git clone https://github.com/facebookresearch/sam2.git && cd sam2

pip install -e .

如果您在 Windows 上安装，强烈建议将 Windows Subsystem for Linux (WSL) 与 Ubuntu 结合使用。

要使用 SAM 2 预测器并运行示例笔记本，需要jupyter和matplotlib ，可以通过以下方式安装：

pip install -e " .[notebooks] "

笔记：

建议通过 Anaconda 为此安装创建一个新的 Python 环境，并通过 https://pytorch.org/ 上的pip安装 PyTorch 2.3.1（或更高版本）。如果您当前环境中的 PyTorch 版本低于 2.3.1，上面的安装命令将尝试使用pip将其升级到最新的 PyTorch 版本。
上述步骤需要使用nvcc编译器编译自定义 CUDA 内核。如果您的计算机上尚未提供该工具包，请安装与您的 PyTorch CUDA 版本匹配的版本的 CUDA 工具包。
如果您在安装过程中看到类似Failed to build the SAM 2 CUDA extension消息，您可以忽略它并仍然使用 SAM 2（某些后处理功能可能会受到限制，但在大多数情况下不会影响结果）。

请参阅INSTALL.md了解有关潜在问题和解决方案的常见问题解答。

入门

下载检查点

首先，我们需要下载模型检查点。所有模型检查点都可以通过运行以下命令下载：

 cd checkpoints && 
./download_ckpts.sh && 
cd ..

或单独来自：

sam2.1_hiera_tiny.pt
sam2.1_hiera_small.pt
sam2.1_hiera_base_plus.pt
sam2.1_hiera_large.pt

（请注意，这些是改进的检查点，表示为 SAM 2.1；有关详细信息，请参阅模型描述。）

然后 SAM 2 可以在如下几行中用于图像和视频预测。

图像预测

SAM 2 具有 SAM 在静态图像上的所有功能，并且我们提供与 SAM 非常相似的图像预测 API 用于图像用例。 SAM2ImagePredictor类有一个简单的图像提示界面。

 import torch
from sam2 . build_sam import build_sam2
from sam2 . sam2_image_predictor import SAM2ImagePredictor

checkpoint = "./checkpoints/sam2.1_hiera_large.pt"
model_cfg = "configs/sam2.1/sam2.1_hiera_l.yaml"
predictor = SAM2ImagePredictor ( build_sam2 ( model_cfg , checkpoint ))

with torch . inference_mode (), torch . autocast ( "cuda" , dtype = torch . bfloat16 ):
    predictor . set_image ( < your_image > )
    masks , _ , _ = predictor . predict ( < input_prompts > )

请参阅 image_predictor_example.ipynb 中的示例（也在此处的 Colab 中）了解静态图像用例。

与 SAM 一样，SAM 2 还支持在图像上自动生成掩模。请参阅automatic_mask_generator_example.ipynb（也在Colab中）了解图像中的自动掩模生成。

视频预测

为了在视频中进行提示分割和跟踪，我们提供了带有 API 的视频预测器，例如在整个视频中添加提示和传播 masklet。 SAM 2 支持对多个对象进行视频推理，并使用推理状态来跟踪每个视频中的交互。

 import torch
from sam2 . build_sam import build_sam2_video_predictor

checkpoint = "./checkpoints/sam2.1_hiera_large.pt"
model_cfg = "configs/sam2.1/sam2.1_hiera_l.yaml"
predictor = build_sam2_video_predictor ( model_cfg , checkpoint )

with torch . inference_mode (), torch . autocast ( "cuda" , dtype = torch . bfloat16 ):
    state = predictor . init_state ( < your_video > )

    # add new prompts and instantly get the output on the same frame
    frame_idx , object_ids , masks = predictor . add_new_points_or_box ( state , < your_prompts > ):

    # propagate the prompts to get masklets throughout the video
    for frame_idx , object_ids , masks in predictor . propagate_in_video ( state ):
        ...

请参阅 video_predictor_example.ipynb 中的示例（也在 Colab 中），详细了解如何添加点击或框提示、进行优化以及跟踪视频中的多个对象。

加载自？抱脸

或者，也可以从 Hugging Face 加载模型（需要pip install huggingface_hub ）。

对于图像预测：

 import torch
from sam2 . sam2_image_predictor import SAM2ImagePredictor

predictor = SAM2ImagePredictor . from_pretrained ( "facebook/sam2-hiera-large" )

with torch . inference_mode (), torch . autocast ( "cuda" , dtype = torch . bfloat16 ):
    predictor . set_image ( < your_image > )
    masks , _ , _ = predictor . predict ( < input_prompts > )

对于视频预测：

 import torch
from sam2 . sam2_video_predictor import SAM2VideoPredictor

predictor = SAM2VideoPredictor . from_pretrained ( "facebook/sam2-hiera-large" )

with torch . inference_mode (), torch . autocast ( "cuda" , dtype = torch . bfloat16 ):
    state = predictor . init_state ( < your_video > )

    # add new prompts and instantly get the output on the same frame
    frame_idx , object_ids , masks = predictor . add_new_points_or_box ( state , < your_prompts > ):

    # propagate the prompts to get masklets throughout the video
    for frame_idx , object_ids , masks in predictor . propagate_in_video ( state ):
        ...

型号说明

SAM 2.1 检查点

下表显示了 2024 年 9 月 29 日发布的改进的 SAM 2.1 检查点。

模型	尺寸（米）	速度（每秒帧数）	SA-V 测试 (J&F)	摩西瓦尔 (J&F)	LVOS v2 (J&F)
sam2.1_hiera_tiny （配置、检查点）	38.9	47.2	76.5	71.8	77.3
sam2.1_hiera_small （配置、检查点）	46	43.3（53.0 编译*）	76.6	73.5	78.3
sam2.1_hiera_base_plus （配置、检查点）	80.8	34.8（43.8 编译*）	78.2	73.7	78.2
sam2.1_hiera_large （配置、检查点）	224.4	24.2（30.2 编译*）	79.5	74.6	80.6

SAM 2 个检查点

2024年7月29日发布的之前的SAM 2检查点可以找到如下：

模型	尺寸（米）	速度（每秒帧数）	SA-V 测试 (J&F)	摩西瓦尔 (J&F)	LVOS v2 (J&F)
sam2_hiera_tiny （配置、检查点）	38.9	47.2	75.0	70.9	75.3
sam2_hiera_small （配置、检查点）	46	43.3（53.0 编译*）	74.9	71.5	76.4
sam2_hiera_base_plus （配置、检查点）	80.8	34.8（43.8 编译*）	74.7	72.8	75.8
sam2_hiera_large （配置、检查点）	224.4	24.2（30.2 编译*）	76.0	74.6	79.8

* 通过在配置中设置compile_image_encoder: True来编译模型。

对任何视频数据集进行分段

有关详细信息，请参阅 sav_dataset/README.md。

训练 SAM 2

您可以在图像、视频或两者的自定义数据集上训练或微调 SAM 2。请查看培训自述文件以了解如何开始。

SAM 2 的网络演示

我们发布了 SAM 2 Web 演示的前端 + 后端代码（类似于 https://sam2.metademolab.com/demo 的本地可部署版本）。有关详细信息，请参阅网络演示自述文件。

执照

SAM 2 模型检查点、SAM 2 演示代码（前端和后端）和 SAM 2 训练代码均在 Apache 2.0 下获得许可，但 SAM 2 演示代码中使用的 Inter Font 和 Noto Color Emoji 则在SIL 开放字体许可证，版本 1.1。

贡献

请参阅贡献和行为准则。

贡献者

SAM 2 项目是在许多贡献者（按字母顺序排列）的帮助下才得以实现的：

凯伦·博根、丹尼尔·博利亚、亚历克斯·博森伯格、凯·布朗、维斯皮·卡索德、克里斯托弗·切多、程艾达、吕克·达林、舒比克·德布纳斯、雷内·马丁内斯·多纳、格兰特·加德纳、沙希尔·戈麦斯、Rishi Godugu、郭百山、Caleb Ho、Andrew Huang、Somya杰恩、鲍勃·卡玛、阿曼达·卡莱特、杰克·金尼、亚历山大·基里洛夫、希瓦·科杜瓦尤尔、德万什·库克瑞贾,罗伯特·郭,林敖涵,帕斯·马拉尼,吉腾德拉·马利克,玛丽卡·马尔霍特拉,米格尔·马丁,亚历山大·米勒,萨莎·米茨,威廉·颜,乔治·奥林,乔丽·皮诺,凯特·萨恩科,罗德里克·谢泼德,阿齐塔·肖克普尔,大卫·苏菲安,乔纳森·托雷斯、 Jenny Truong、Sagar Vaze、Meng Wang、Claudette Ward、Pengchuan 张。

第三方代码：我们使用改编自cc_torch的基于 GPU 的连接组件算法（其许可证位于LICENSE_cctorch ）作为掩模预测的可选后处理步骤。

引用 SAM 2

如果您在研究中使用 SAM 2 或 SA-V 数据集，请使用以下 BibTeX 条目。

 @article { ravi2024sam2 ,
  title = { SAM 2: Segment Anything in Images and Videos } ,
  author = { Ravi, Nikhila and Gabeur, Valentin and Hu, Yuan-Ting and Hu, Ronghang and Ryali, Chaitanya and Ma, Tengyu and Khedr, Haitham and R{"a}dle, Roman and Rolland, Chloe and Gustafson, Laura and Mintun, Eric and Pan, Junting and Alwala, Kalyan Vasudev and Carion, Nicolas and Wu, Chao-Yuan and Girshick, Ross and Doll{'a}r, Piotr and Feichtenhofer, Christoph } ,
  journal = { arXiv preprint arXiv:2408.00714 } ,
  url = { https://arxiv.org/abs/2408.00714 } ,
  year = { 2024 }
}