# Segment Anything (SAM) — Zero-Shot Image Segmentation by Meta

> A foundation model for promptable image segmentation that can segment any object in any image without additional training. SAM powers interactive annotation, downstream vision tasks, and zero-shot transfer.

## Install

Save the content below to `.claude/skills/` or append to your `CLAUDE.md`:

# Segment Anything (SAM) — Zero-Shot Image Segmentation by Meta

## Quick Use
```bash
pip install git+https://github.com/facebookresearch/segment-anything.git
pip install opencv-python pycocotools matplotlib onnxruntime
# Download a model checkpoint
wget https://dl.fbaipublicfiles.com/segment_anything/sam_vit_h_4b8939.pth
python -c "
from segment_anything import sam_model_registry, SamPredictor
sam = sam_model_registry['vit_h'](checkpoint='sam_vit_h_4b8939.pth')
predictor = SamPredictor(sam)
"
```

## Introduction
Segment Anything Model (SAM) is Meta AI's promptable segmentation foundation model. Given an image and a prompt such as a point, bounding box, or text, SAM produces high-quality object masks without needing task-specific fine-tuning.

## What SAM Does
- Segments any object given point, box, or mask prompts
- Generates multiple valid masks when prompts are ambiguous
- Runs in real-time on GPU for interactive annotation workflows
- Exports to ONNX for deployment in browsers and edge devices
- Provides the SA-1B dataset with over 1 billion masks on 11 million images

## Architecture Overview
SAM has three components: a ViT-based image encoder that produces image embeddings once per image, a flexible prompt encoder that handles points, boxes, masks, and text, and a lightweight mask decoder that combines both to predict segmentation masks. This design allows the heavy image encoding to be amortized across multiple prompts.

## Self-Hosting & Configuration
- Requires Python 3.8+ and PyTorch 1.7+
- Three checkpoint sizes available: ViT-B (375 MB), ViT-L (1.2 GB), ViT-H (2.4 GB)
- ONNX export enables CPU-only or browser-based deployment
- GPU with 8 GB VRAM is sufficient for real-time single-image inference
- Can be used as a library or through the included demo notebooks

## Key Features
- Zero-shot generalization to unseen object categories and domains
- Trained on SA-1B, one of the largest segmentation datasets ever created
- Interactive point-and-click interface for fast manual annotation
- Multiple mask output with confidence scores for ambiguous prompts
- ONNX runtime support for lightweight deployment without PyTorch

## Comparison with Similar Tools
- **SAM 2** — Meta's successor with video segmentation support; SAM focuses on single images
- **Detectron2** — Meta's detection framework; requires task-specific training unlike SAM
- **YOLO** — excels at real-time detection and segmentation with fixed categories; SAM handles open-vocabulary
- **U-Net** — classical encoder-decoder for segmentation; needs domain-specific labels and training

## FAQ
**Q: Can SAM segment video?**
A: SAM operates on single images. For video segmentation, use SAM 2.

**Q: Does SAM work without a GPU?**
A: Yes, the ONNX model runs on CPU, though inference is slower.

**Q: How accurate is SAM on domain-specific data like medical imaging?**
A: SAM generalizes well but may need fine-tuning for specialized domains where visual patterns differ significantly from natural images.

**Q: Is the SA-1B dataset available?**
A: Yes, Meta released SA-1B under a research license for academic and non-commercial use.

## Sources
- https://github.com/facebookresearch/segment-anything
- https://segment-anything.com/

---
Source: https://tokrepo.com/en/workflows/segment-anything-sam-zero-shot-image-segmentation-meta-795a30dc
Author: AI Open Source