Cette page est affichée en anglais. Une traduction française est en cours.
SkillsMay 2, 2026·3 min de lecture

Segment Anything (SAM) — Zero-Shot Image Segmentation by Meta

A foundation model for promptable image segmentation that can segment any object in any image without additional training. SAM powers interactive annotation, downstream vision tasks, and zero-shot transfer.

Prêt pour agents

Cet actif peut être lu et installé directement par les agents

TokRepo expose une commande CLI universelle, un contrat d'installation, le metadata JSON, un plan selon l'adaptateur et le contenu raw pour aider les agents à juger l'adaptation, le risque et les prochaines actions.

Needs Confirmation · 64/100Policy : confirmer
Surface agent
Tout agent MCP/CLI
Type
Skill
Installation
Single
Confiance
Confiance : Established
Point d'entrée
Segment Anything Overview
Commande CLI universelle
npx tokrepo install 795a30dc-45df-11f1-9bc6-00163e2b0d79

Introduction

Segment Anything Model (SAM) is Meta AI's promptable segmentation foundation model. Given an image and a prompt such as a point, bounding box, or text, SAM produces high-quality object masks without needing task-specific fine-tuning.

What SAM Does

  • Segments any object given point, box, or mask prompts
  • Generates multiple valid masks when prompts are ambiguous
  • Runs in real-time on GPU for interactive annotation workflows
  • Exports to ONNX for deployment in browsers and edge devices
  • Provides the SA-1B dataset with over 1 billion masks on 11 million images

Architecture Overview

SAM has three components: a ViT-based image encoder that produces image embeddings once per image, a flexible prompt encoder that handles points, boxes, masks, and text, and a lightweight mask decoder that combines both to predict segmentation masks. This design allows the heavy image encoding to be amortized across multiple prompts.

Self-Hosting & Configuration

  • Requires Python 3.8+ and PyTorch 1.7+
  • Three checkpoint sizes available: ViT-B (375 MB), ViT-L (1.2 GB), ViT-H (2.4 GB)
  • ONNX export enables CPU-only or browser-based deployment
  • GPU with 8 GB VRAM is sufficient for real-time single-image inference
  • Can be used as a library or through the included demo notebooks

Key Features

  • Zero-shot generalization to unseen object categories and domains
  • Trained on SA-1B, one of the largest segmentation datasets ever created
  • Interactive point-and-click interface for fast manual annotation
  • Multiple mask output with confidence scores for ambiguous prompts
  • ONNX runtime support for lightweight deployment without PyTorch

Comparison with Similar Tools

  • SAM 2 — Meta's successor with video segmentation support; SAM focuses on single images
  • Detectron2 — Meta's detection framework; requires task-specific training unlike SAM
  • YOLO — excels at real-time detection and segmentation with fixed categories; SAM handles open-vocabulary
  • U-Net — classical encoder-decoder for segmentation; needs domain-specific labels and training

FAQ

Q: Can SAM segment video? A: SAM operates on single images. For video segmentation, use SAM 2.

Q: Does SAM work without a GPU? A: Yes, the ONNX model runs on CPU, though inference is slower.

Q: How accurate is SAM on domain-specific data like medical imaging? A: SAM generalizes well but may need fine-tuning for specialized domains where visual patterns differ significantly from natural images.

Q: Is the SA-1B dataset available? A: Yes, Meta released SA-1B under a research license for academic and non-commercial use.

Sources

Fil de discussion

Connectez-vous pour rejoindre la discussion.
Aucun commentaire pour l'instant. Soyez le premier à partager votre avis.

Actifs similaires