Cette page est affichée en anglais. Une traduction française est en cours.
SkillsMay 2, 2026·3 min de lecture

SAM 2 — Segment Anything in Images and Videos

Meta's next-generation Segment Anything Model that extends promptable segmentation from images to videos. SAM 2 tracks and segments objects across video frames in real-time with a unified architecture.

Prêt pour agents

Cet actif peut être lu et installé directement par les agents

TokRepo expose une commande CLI universelle, un contrat d'installation, le metadata JSON, un plan selon l'adaptateur et le contenu raw pour aider les agents à juger l'adaptation, le risque et les prochaines actions.

Native · 98/100Policy : autoriser
Surface agent
Tout agent MCP/CLI
Type
Skill
Installation
Single
Confiance
Confiance : Established
Point d'entrée
SAM 2 Overview
Commande CLI universelle
npx tokrepo install c9dc9efb-45df-11f1-9bc6-00163e2b0d79

Introduction

SAM 2 (Segment Anything Model 2) extends Meta's original SAM from static images to streaming video. It introduces a memory mechanism that allows the model to track and segment objects across frames, handling occlusions, reappearances, and object deformation.

What SAM 2 Does

  • Segments objects in both images and videos with point, box, or mask prompts
  • Tracks segmented objects across video frames with temporal consistency
  • Handles occlusion and object reappearance using a memory bank
  • Supports interactive refinement of masks on any frame during processing
  • Provides the SA-V dataset with 642K masklets across 51K videos

Architecture Overview

SAM 2 uses a Hiera image encoder for per-frame feature extraction, a memory attention module that conditions current-frame predictions on past frames and prompted frames stored in a memory bank, and the same lightweight mask decoder from SAM. A memory encoder writes per-frame predictions back to the bank for future reference. This streaming architecture processes video frame by frame without requiring the full video in memory.

Self-Hosting & Configuration

  • Requires Python 3.10+ and PyTorch 2.3.1+
  • Multiple checkpoint sizes: Hiera-T (39M), Hiera-S, Hiera-B+, Hiera-L (224M)
  • GPU with 8 GB VRAM sufficient for the base model
  • Jupyter notebook demos included for both image and video workflows
  • Supports ONNX export for edge deployment

Key Features

  • Unified architecture handles both image and video segmentation
  • 6x faster than SAM on images due to the more efficient Hiera backbone
  • Memory mechanism enables real-time video object tracking
  • SA-V dataset is 53x larger than prior video segmentation datasets
  • Interactive prompting allows corrections at any video frame

Comparison with Similar Tools

  • SAM (v1) — image-only segmentation; SAM 2 adds video tracking and a faster backbone
  • XMem — strong video object segmentation baseline; SAM 2 adds promptable interaction and better generalization
  • Cutie — semi-supervised video segmentation; SAM 2 supports zero-shot prompting without per-video training
  • Track Anything Model (TAM) — combines SAM with tracking heuristics; SAM 2 integrates tracking natively

FAQ

Q: Can SAM 2 run on live camera feeds? A: The streaming architecture processes frames sequentially and can work with live feeds given sufficient GPU throughput.

Q: Is SAM 2 backward compatible with SAM? A: SAM 2 handles images as single-frame videos and outperforms SAM v1 on image segmentation benchmarks.

Q: What video formats are supported? A: The model processes extracted frames (JPEG/PNG). Video decoding is handled separately before inference.

Q: How long can processed videos be? A: There is no hard limit. The memory bank uses a fixed window, so arbitrarily long videos can be processed in streaming fashion.

Sources

Fil de discussion

Connectez-vous pour rejoindre la discussion.
Aucun commentaire pour l'instant. Soyez le premier à partager votre avis.

Actifs similaires