sam2下載 - sam2原始碼下載

sam2

其他源碼

下載

SAM 2：分割影像和影片中的任何內容

Meta、FAIR 的人工智慧

Nikhila Ravi、Valentin Gabeur、胡元廷、胡榮航、Chaitanya Ryali、馬騰宇、Haitham Khedr、Roman Rädle、Chloe Rolland、Laura Gustafson、Eric Mintun、潘騰宇、Kalyan Vasudev Alwala、Nicolas Carion、吳朝元、羅斯吉爾希克、皮奧特·多拉爾、克里斯托夫·費希滕霍費爾

[ Paper ] [ Project ] [ Demo ] [ Dataset ] [ Blog ] [ BibTeX ]

SAM 2架構

Segment Anything Model 2 (SAM 2)是解決影像和影片中快速視覺分割問題的基礎模型。我們將 SAM 擴展到視頻，將圖像視為具有單幀的視頻。該模型設計是一個簡單的變壓器架構，具有用於即時視訊處理的串流記憶體。我們建立了一個模型在環數據引擎，它透過用戶互動來改進模型和數據，以收集我們的 SA-V 數據集，這是迄今為止最大的視訊分割數據集。根據我們的資料進行訓練的 SAM 2 在廣泛的任務和視覺領域中提供了強大的性能。

SA-V資料集

安裝

使用前需先安裝 SAM 2。程式碼需要python>=3.10 ，以及torch>=2.3.1和torchvision>=0.18.1 。請依照此處的說明安裝 PyTorch 和 TorchVision 依賴項。您可以使用下列命令在 GPU 電腦上安裝 SAM 2：

git clone https://github.com/facebookresearch/sam2.git && cd sam2

pip install -e .

如果您在 Windows 上安裝，強烈建議將 Windows Subsystem for Linux (WSL) 與 Ubuntu 結合使用。

要使用 SAM 2 預測器並執行範例筆記本，需要jupyter和matplotlib ，可以透過以下方式安裝：

pip install -e " .[notebooks] "

筆記：

建議透過 Anaconda 為此安裝創建一個新的 Python 環境，並透過 https://pytorch.org/ 上的pip安裝 PyTorch 2.3.1（或更高版本）。如果您目前環境中的 PyTorch 版本低於 2.3.1，則上面的安裝指令將嘗試使用pip將其升級到最新的 PyTorch 版本。
上述步驟需要使用nvcc編譯器編譯自訂 CUDA 核心。如果您的電腦上尚未提供該工具包，請安裝與您的 PyTorch CUDA 版本相符的版本的 CUDA 工具包。
如果您在安裝過程中看到類似Failed to build the SAM 2 CUDA extension訊息，您可以忽略它並且仍然使用SAM 2（某些後處理功能可能會受到限制，但在大多數情況下不會影響結果）。

請參閱INSTALL.md以了解潛在問題和解決方案的常見問題。

入門

下載檢查點

首先，我們需要下載模型檢查點。所有模型檢查點都可以透過執行以下命令下載：

 cd checkpoints && 
./download_ckpts.sh && 
cd ..

或單獨來自：

sam2.1_hiera_tiny.pt
sam2.1_hiera_small.pt
sam2.1_hiera_base_plus.pt
sam2.1_hiera_large.pt

（請注意，這些是改進的檢查點，表示為 SAM 2.1；有關詳細信息，請參閱模型描述。）

然後 SAM 2 可以在以下幾行中用於影像和視訊預測。

影像預測

SAM 2 具有 SAM 在靜態影像上的所有功能，且我們提供與 SAM 非常相似的影像預測 API 用於影像用例。 SAM2ImagePredictor類別有一個簡單的影像提示介面。

 import torch
from sam2 . build_sam import build_sam2
from sam2 . sam2_image_predictor import SAM2ImagePredictor

checkpoint = "./checkpoints/sam2.1_hiera_large.pt"
model_cfg = "configs/sam2.1/sam2.1_hiera_l.yaml"
predictor = SAM2ImagePredictor ( build_sam2 ( model_cfg , checkpoint ))

with torch . inference_mode (), torch . autocast ( "cuda" , dtype = torch . bfloat16 ):
    predictor . set_image ( < your_image > )
    masks , _ , _ = predictor . predict ( < input_prompts > )

請參閱 image_predictor_example.ipynb 中的範例（也在此處的 Colab 中）以了解靜態影像用例。

與 SAM 一樣，SAM 2 也支援在影像上自動產生遮罩。請參閱automatic_mask_generator_example.ipynb（也在Colab中）以了解影像中的自動遮罩產生。

影片預測

為了在影片中進行提示分割和跟踪，我們提供了帶有 API 的視訊預測器，例如在整個影片中添加提示和傳播 masklet。 SAM 2 支援對多個物件進行視訊推理，並使用推理狀態來追蹤每個視訊中的互動。

 import torch
from sam2 . build_sam import build_sam2_video_predictor

checkpoint = "./checkpoints/sam2.1_hiera_large.pt"
model_cfg = "configs/sam2.1/sam2.1_hiera_l.yaml"
predictor = build_sam2_video_predictor ( model_cfg , checkpoint )

with torch . inference_mode (), torch . autocast ( "cuda" , dtype = torch . bfloat16 ):
    state = predictor . init_state ( < your_video > )

    # add new prompts and instantly get the output on the same frame
    frame_idx , object_ids , masks = predictor . add_new_points_or_box ( state , < your_prompts > ):

    # propagate the prompts to get masklets throughout the video
    for frame_idx , object_ids , masks in predictor . propagate_in_video ( state ):
        ...

請參閱 video_predictor_example.ipynb 中的範例（也在 Colab 中），以詳細了解如何新增點擊或方塊提示、進行最佳化以及追蹤影片中的多個物件。

加載自？抱臉

或者，也可以從 Hugging Face 載入模型（需要pip install huggingface_hub ）。

對於影像預測：

 import torch
from sam2 . sam2_image_predictor import SAM2ImagePredictor

predictor = SAM2ImagePredictor . from_pretrained ( "facebook/sam2-hiera-large" )

with torch . inference_mode (), torch . autocast ( "cuda" , dtype = torch . bfloat16 ):
    predictor . set_image ( < your_image > )
    masks , _ , _ = predictor . predict ( < input_prompts > )

對於視訊預測：

 import torch
from sam2 . sam2_video_predictor import SAM2VideoPredictor

predictor = SAM2VideoPredictor . from_pretrained ( "facebook/sam2-hiera-large" )

with torch . inference_mode (), torch . autocast ( "cuda" , dtype = torch . bfloat16 ):
    state = predictor . init_state ( < your_video > )

    # add new prompts and instantly get the output on the same frame
    frame_idx , object_ids , masks = predictor . add_new_points_or_box ( state , < your_prompts > ):

    # propagate the prompts to get masklets throughout the video
    for frame_idx , object_ids , masks in predictor . propagate_in_video ( state ):
        ...

型號說明

SAM 2.1 檢查點

下表顯示了 2024 年 9 月 29 日發布的改進的 SAM 2.1 檢查點。

模型	尺寸（米）	速度（每秒幀數）	SA-V 測試 (J&F)	摩西瓦爾 (J&F)	LVOS v2 (J&F)
sam2.1_hiera_tiny （配置、檢查點）	38.9	47.2	76.5	71.8	77.3
sam2.1_hiera_small （配置、檢查點）	46	43.3（53.0 編譯*）	76.6	73.5	78.3
sam2.1_hiera_base_plus （配置、檢查點）	80.8	34.8（43.8 編譯*）	78.2	73.7	78.2
sam2.1_hiera_large （配置、檢查點）	224.4	24.2（30.2 編譯*）	79.5	74.6	80.6

SAM 2 個檢查點

2024年7月29日發布的之前的SAM 2檢查點可以找到如下：

模型	尺寸（米）	速度（每秒幀數）	SA-V 測試 (J&F)	摩西瓦爾 (J&F)	LVOS v2 (J&F)
sam2_hiera_tiny （配置、檢查點）	38.9	47.2	75.0	70.9	75.3
sam2_hiera_small （配置、檢查點）	46	43.3（53.0 編譯*）	74.9	71.5	76.4
sam2_hiera_base_plus （配置、檢查點）	80.8	34.8（43.8 編譯*）	74.7	72.8	75.8
sam2_hiera_large （配置、檢查點）	224.4	24.2（30.2 編譯*）	76.0	74.6	79.8

* 透過在配置中設定compile_image_encoder: True來編譯模型。

對任何視訊資料集進行分段

有關詳細信息，請參閱 sav_dataset/README.md。

訓練 SAM 2

您可以在圖像、影片或兩者的自訂資料集上訓練或微調 SAM 2。請查看培訓自述文件以了解如何開始。

SAM 2 的網路演示

我們發布了 SAM 2 Web 演示的前端 + 後端程式碼（類似於 https://sam2.metademolab.com/demo 的本機可部署版本）。有關詳細信息，請參閱網絡演示自述文件。

執照

SAM 2 模型檢查點、SAM 2 示範程式碼（前端和後端）和 SAM 2 訓練程式碼皆在 Apache 2.0 下取得許可，但 SAM 2 示範程式碼中使用的 Inter Font 和 Noto Color Emoji 則在SIL 開放字型許可證，版本1.1。

貢獻

請參閱貢獻和行為準則。

貢獻者

SAM 2 專案是在許多貢獻者（按字母順序排列）的幫助下才得以實現的：

凱倫·博根、丹尼爾·博利亞、亞歷克斯·博森伯格、凱·布朗、維斯皮·卡索德、克里斯托弗·切多、程艾達、呂克·達林、舒比克·德布納斯、雷內·馬丁內斯·多納、格蘭特·加德納、沙希爾·戈麥斯、Rishi Godugu、郭百山、Caleb Ho、Andrew Huang、Somya Jain、鮑勃·卡瑪、阿曼達·卡利特、傑克·金尼、亞歷山大·基里洛夫、Shiva Koduvayur、Devansh Kukreja、羅伯特·郭、林敖漢、帕斯·馬拉尼、Jitendra Malik、瑪麗卡·馬爾霍特拉、米格爾馬丁、亞歷山大米勒、薩沙米茨、William Ngan、喬治奧林、喬埃爾皮諾、凱特薩恩科、羅德里克夏普德、阿齊塔肖克普爾、大衛蘇菲安、喬納森托雷斯、珍妮特魯恩、薩加爾瓦茲、王猛、克勞黛特沃德、張彭川。

第三方程式碼：我們使用改編自cc_torch的基於 GPU 的連接組件演算法（其許可證位於LICENSE_cctorch ）作為掩模預測的可選後處理步驟。

引用 SAM 2

如果您在研究中使用 SAM 2 或 SA-V 資料集，請使用下列 BibTeX 條目。

 @article { ravi2024sam2 ,
  title = { SAM 2: Segment Anything in Images and Videos } ,
  author = { Ravi, Nikhila and Gabeur, Valentin and Hu, Yuan-Ting and Hu, Ronghang and Ryali, Chaitanya and Ma, Tengyu and Khedr, Haitham and R{"a}dle, Roman and Rolland, Chloe and Gustafson, Laura and Mintun, Eric and Pan, Junting and Alwala, Kalyan Vasudev and Carion, Nicolas and Wu, Chao-Yuan and Girshick, Ross and Doll{'a}r, Piotr and Feichtenhofer, Christoph } ,
  journal = { arXiv preprint arXiv:2408.00714 } ,
  url = { https://arxiv.org/abs/2408.00714 } ,
  year = { 2024 }
}