Cette page est affichée en anglais. Une traduction française est en cours.
SkillsApr 28, 2026·3 min de lecture

Segment Anything (SAM) — Foundation Model for Image Segmentation

Segment Anything Model by Meta AI provides a promptable segmentation system that can isolate any object in an image given points, boxes, or text prompts, enabling zero-shot transfer to new visual domains.

Prêt pour agents

Installation agent prête

Cet actif peut être installé après choix du runtime, vérification du plan et exécution de la commande adaptée.

Native · 98/100Policy : autoriser
Surface agent
Tout agent MCP/CLI
Type
Skill
Installation
Single
Confiance
Confiance : Established
Point d'entrée
Segment Anything Overview
Commande d'installation directe
npx -y tokrepo@latest install 801ea8f1-42b9-11f1-9bc6-00163e2b0d79 --target codex

À exécuter après confirmation du plan en dry-run.

Introduction

Segment Anything (SAM) is a foundation model for image segmentation released by Meta AI Research. It was trained on the SA-1B dataset containing over one billion masks and can segment any object in an image without task-specific fine-tuning, making it a general-purpose building block for computer vision pipelines.

What Segment Anything Does

  • Segments any object in an image given point, box, or text prompts
  • Generates multiple valid masks with confidence scores for ambiguous prompts
  • Runs zero-shot on new image domains without retraining
  • Produces high-quality masks at interactive speeds on GPU
  • Serves as a backbone for downstream tasks like video segmentation, medical imaging, and 3D reconstruction

Architecture Overview

SAM consists of three components: an image encoder (ViT-based), a flexible prompt encoder that handles points, boxes, and free-form text, and a lightweight mask decoder. The image encoder runs once per image, and the prompt encoder plus mask decoder run per query, enabling real-time interactive segmentation. The model outputs per-mask IoU scores for automatic quality ranking.

Self-Hosting & Configuration

  • Install via pip and download model checkpoints (ViT-H, ViT-L, or ViT-B) from the repository
  • Requires PyTorch 1.7+ and a CUDA GPU for efficient inference
  • The SamPredictor class provides a simple API for single-image segmentation
  • SamAutomaticMaskGenerator generates masks for all objects in an image without prompts
  • Integrates with OpenCV and PIL for image I/O

Key Features

  • Zero-shot generalization to unseen object types and visual domains
  • Interactive prompting with points, bounding boxes, or masks
  • Automatic mask generation mode for full-scene segmentation
  • Three model sizes (ViT-B, ViT-L, ViT-H) trading speed for accuracy
  • Permissive Apache 2.0 license for commercial use

Comparison with Similar Tools

  • SAM 2 — extends SAM to video with streaming memory; higher temporal consistency
  • Grounding DINO — open-set object detection; often paired with SAM for text-prompted segmentation
  • Detectron2 — full detection/segmentation framework; requires task-specific training
  • YOLO — optimized for real-time detection; less precise per-pixel masks
  • U-Net — classic medical segmentation; needs domain-specific labeled data

FAQ

Q: Can SAM run on CPU? A: Yes, but inference is significantly slower. GPU is recommended for interactive use.

Q: Does SAM understand semantic categories? A: SAM segments objects by spatial prompts, not semantic labels. Pair it with a classifier or Grounding DINO for labeled segmentation.

Q: What image formats are supported? A: Any format readable by PIL or OpenCV, including JPEG, PNG, TIFF, and BMP.

Q: Can I fine-tune SAM on custom data? A: The model weights can be fine-tuned using standard PyTorch training loops, though zero-shot performance is strong for most use cases.

Sources

Fil de discussion

Connectez-vous pour rejoindre la discussion.
Aucun commentaire pour l'instant. Soyez le premier à partager votre avis.

Actifs similaires