Introduction
Segment Anything Model (SAM) is Meta AI's promptable segmentation foundation model. Given an image and a prompt such as a point, bounding box, or text, SAM produces high-quality object masks without needing task-specific fine-tuning.
What SAM Does
- Segments any object given point, box, or mask prompts
- Generates multiple valid masks when prompts are ambiguous
- Runs in real-time on GPU for interactive annotation workflows
- Exports to ONNX for deployment in browsers and edge devices
- Provides the SA-1B dataset with over 1 billion masks on 11 million images
Architecture Overview
SAM has three components: a ViT-based image encoder that produces image embeddings once per image, a flexible prompt encoder that handles points, boxes, masks, and text, and a lightweight mask decoder that combines both to predict segmentation masks. This design allows the heavy image encoding to be amortized across multiple prompts.
Self-Hosting & Configuration
- Requires Python 3.8+ and PyTorch 1.7+
- Three checkpoint sizes available: ViT-B (375 MB), ViT-L (1.2 GB), ViT-H (2.4 GB)
- ONNX export enables CPU-only or browser-based deployment
- GPU with 8 GB VRAM is sufficient for real-time single-image inference
- Can be used as a library or through the included demo notebooks
Key Features
- Zero-shot generalization to unseen object categories and domains
- Trained on SA-1B, one of the largest segmentation datasets ever created
- Interactive point-and-click interface for fast manual annotation
- Multiple mask output with confidence scores for ambiguous prompts
- ONNX runtime support for lightweight deployment without PyTorch
Comparison with Similar Tools
- SAM 2 — Meta's successor with video segmentation support; SAM focuses on single images
- Detectron2 — Meta's detection framework; requires task-specific training unlike SAM
- YOLO — excels at real-time detection and segmentation with fixed categories; SAM handles open-vocabulary
- U-Net — classical encoder-decoder for segmentation; needs domain-specific labels and training
FAQ
Q: Can SAM segment video? A: SAM operates on single images. For video segmentation, use SAM 2.
Q: Does SAM work without a GPU? A: Yes, the ONNX model runs on CPU, though inference is slower.
Q: How accurate is SAM on domain-specific data like medical imaging? A: SAM generalizes well but may need fine-tuning for specialized domains where visual patterns differ significantly from natural images.
Q: Is the SA-1B dataset available? A: Yes, Meta released SA-1B under a research license for academic and non-commercial use.