# Segment Anything (SAM) — Zero-Shot Image Segmentation by Meta > A foundation model for promptable image segmentation that can segment any object in any image without additional training. SAM powers interactive annotation, downstream vision tasks, and zero-shot transfer. ## Install Save the content below to `.claude/skills/` or append to your `CLAUDE.md`: # Segment Anything (SAM) — Zero-Shot Image Segmentation by Meta ## Quick Use ```bash pip install git+https://github.com/facebookresearch/segment-anything.git pip install opencv-python pycocotools matplotlib onnxruntime # Download a model checkpoint wget https://dl.fbaipublicfiles.com/segment_anything/sam_vit_h_4b8939.pth python -c " from segment_anything import sam_model_registry, SamPredictor sam = sam_model_registry['vit_h'](checkpoint='sam_vit_h_4b8939.pth') predictor = SamPredictor(sam) " ``` ## Introduction Segment Anything Model (SAM) is Meta AI's promptable segmentation foundation model. Given an image and a prompt such as a point, bounding box, or text, SAM produces high-quality object masks without needing task-specific fine-tuning. ## What SAM Does - Segments any object given point, box, or mask prompts - Generates multiple valid masks when prompts are ambiguous - Runs in real-time on GPU for interactive annotation workflows - Exports to ONNX for deployment in browsers and edge devices - Provides the SA-1B dataset with over 1 billion masks on 11 million images ## Architecture Overview SAM has three components: a ViT-based image encoder that produces image embeddings once per image, a flexible prompt encoder that handles points, boxes, masks, and text, and a lightweight mask decoder that combines both to predict segmentation masks. This design allows the heavy image encoding to be amortized across multiple prompts. ## Self-Hosting & Configuration - Requires Python 3.8+ and PyTorch 1.7+ - Three checkpoint sizes available: ViT-B (375 MB), ViT-L (1.2 GB), ViT-H (2.4 GB) - ONNX export enables CPU-only or browser-based deployment - GPU with 8 GB VRAM is sufficient for real-time single-image inference - Can be used as a library or through the included demo notebooks ## Key Features - Zero-shot generalization to unseen object categories and domains - Trained on SA-1B, one of the largest segmentation datasets ever created - Interactive point-and-click interface for fast manual annotation - Multiple mask output with confidence scores for ambiguous prompts - ONNX runtime support for lightweight deployment without PyTorch ## Comparison with Similar Tools - **SAM 2** — Meta's successor with video segmentation support; SAM focuses on single images - **Detectron2** — Meta's detection framework; requires task-specific training unlike SAM - **YOLO** — excels at real-time detection and segmentation with fixed categories; SAM handles open-vocabulary - **U-Net** — classical encoder-decoder for segmentation; needs domain-specific labels and training ## FAQ **Q: Can SAM segment video?** A: SAM operates on single images. For video segmentation, use SAM 2. **Q: Does SAM work without a GPU?** A: Yes, the ONNX model runs on CPU, though inference is slower. **Q: How accurate is SAM on domain-specific data like medical imaging?** A: SAM generalizes well but may need fine-tuning for specialized domains where visual patterns differ significantly from natural images. **Q: Is the SA-1B dataset available?** A: Yes, Meta released SA-1B under a research license for academic and non-commercial use. ## Sources - https://github.com/facebookresearch/segment-anything - https://segment-anything.com/ --- Source: https://tokrepo.com/en/workflows/segment-anything-sam-zero-shot-image-segmentation-meta-795a30dc Author: AI Open Source