Configs2026年5月2日·1 分钟阅读

Segment Anything (SAM) — Zero-Shot Image Segmentation by Meta

A foundation model for promptable image segmentation that can segment any object in any image without additional training. SAM powers interactive annotation, downstream vision tasks, and zero-shot transfer.

Introduction

Segment Anything Model (SAM) is Meta AI's promptable segmentation foundation model. Given an image and a prompt such as a point, bounding box, or text, SAM produces high-quality object masks without needing task-specific fine-tuning.

What SAM Does

  • Segments any object given point, box, or mask prompts
  • Generates multiple valid masks when prompts are ambiguous
  • Runs in real-time on GPU for interactive annotation workflows
  • Exports to ONNX for deployment in browsers and edge devices
  • Provides the SA-1B dataset with over 1 billion masks on 11 million images

Architecture Overview

SAM has three components: a ViT-based image encoder that produces image embeddings once per image, a flexible prompt encoder that handles points, boxes, masks, and text, and a lightweight mask decoder that combines both to predict segmentation masks. This design allows the heavy image encoding to be amortized across multiple prompts.

Self-Hosting & Configuration

  • Requires Python 3.8+ and PyTorch 1.7+
  • Three checkpoint sizes available: ViT-B (375 MB), ViT-L (1.2 GB), ViT-H (2.4 GB)
  • ONNX export enables CPU-only or browser-based deployment
  • GPU with 8 GB VRAM is sufficient for real-time single-image inference
  • Can be used as a library or through the included demo notebooks

Key Features

  • Zero-shot generalization to unseen object categories and domains
  • Trained on SA-1B, one of the largest segmentation datasets ever created
  • Interactive point-and-click interface for fast manual annotation
  • Multiple mask output with confidence scores for ambiguous prompts
  • ONNX runtime support for lightweight deployment without PyTorch

Comparison with Similar Tools

  • SAM 2 — Meta's successor with video segmentation support; SAM focuses on single images
  • Detectron2 — Meta's detection framework; requires task-specific training unlike SAM
  • YOLO — excels at real-time detection and segmentation with fixed categories; SAM handles open-vocabulary
  • U-Net — classical encoder-decoder for segmentation; needs domain-specific labels and training

FAQ

Q: Can SAM segment video? A: SAM operates on single images. For video segmentation, use SAM 2.

Q: Does SAM work without a GPU? A: Yes, the ONNX model runs on CPU, though inference is slower.

Q: How accurate is SAM on domain-specific data like medical imaging? A: SAM generalizes well but may need fine-tuning for specialized domains where visual patterns differ significantly from natural images.

Q: Is the SA-1B dataset available? A: Yes, Meta released SA-1B under a research license for academic and non-commercial use.

Sources

讨论

登录后参与讨论。
还没有评论,来写第一条吧。

相关资产