Cette page est affichée en anglais. Une traduction française est en cours.
ConfigsApr 28, 2026·3 min de lecture

Segment Anything (SAM) — Foundation Model for Image Segmentation

Segment Anything Model by Meta AI provides a promptable segmentation system that can isolate any object in an image given points, boxes, or text prompts, enabling zero-shot transfer to new visual domains.

Introduction

Segment Anything (SAM) is a foundation model for image segmentation released by Meta AI Research. It was trained on the SA-1B dataset containing over one billion masks and can segment any object in an image without task-specific fine-tuning, making it a general-purpose building block for computer vision pipelines.

What Segment Anything Does

  • Segments any object in an image given point, box, or text prompts
  • Generates multiple valid masks with confidence scores for ambiguous prompts
  • Runs zero-shot on new image domains without retraining
  • Produces high-quality masks at interactive speeds on GPU
  • Serves as a backbone for downstream tasks like video segmentation, medical imaging, and 3D reconstruction

Architecture Overview

SAM consists of three components: an image encoder (ViT-based), a flexible prompt encoder that handles points, boxes, and free-form text, and a lightweight mask decoder. The image encoder runs once per image, and the prompt encoder plus mask decoder run per query, enabling real-time interactive segmentation. The model outputs per-mask IoU scores for automatic quality ranking.

Self-Hosting & Configuration

  • Install via pip and download model checkpoints (ViT-H, ViT-L, or ViT-B) from the repository
  • Requires PyTorch 1.7+ and a CUDA GPU for efficient inference
  • The SamPredictor class provides a simple API for single-image segmentation
  • SamAutomaticMaskGenerator generates masks for all objects in an image without prompts
  • Integrates with OpenCV and PIL for image I/O

Key Features

  • Zero-shot generalization to unseen object types and visual domains
  • Interactive prompting with points, bounding boxes, or masks
  • Automatic mask generation mode for full-scene segmentation
  • Three model sizes (ViT-B, ViT-L, ViT-H) trading speed for accuracy
  • Permissive Apache 2.0 license for commercial use

Comparison with Similar Tools

  • SAM 2 — extends SAM to video with streaming memory; higher temporal consistency
  • Grounding DINO — open-set object detection; often paired with SAM for text-prompted segmentation
  • Detectron2 — full detection/segmentation framework; requires task-specific training
  • YOLO — optimized for real-time detection; less precise per-pixel masks
  • U-Net — classic medical segmentation; needs domain-specific labeled data

FAQ

Q: Can SAM run on CPU? A: Yes, but inference is significantly slower. GPU is recommended for interactive use.

Q: Does SAM understand semantic categories? A: SAM segments objects by spatial prompts, not semantic labels. Pair it with a classifier or Grounding DINO for labeled segmentation.

Q: What image formats are supported? A: Any format readable by PIL or OpenCV, including JPEG, PNG, TIFF, and BMP.

Q: Can I fine-tune SAM on custom data? A: The model weights can be fine-tuned using standard PyTorch training loops, though zero-shot performance is strong for most use cases.

Sources

Fil de discussion

Connectez-vous pour rejoindre la discussion.
Aucun commentaire pour l'instant. Soyez le premier à partager votre avis.

Actifs similaires