# Segment Anything (SAM) — Foundation Model for Image Segmentation > Segment Anything Model by Meta AI provides a promptable segmentation system that can isolate any object in an image given points, boxes, or text prompts, enabling zero-shot transfer to new visual domains. ## Install Save in your project root: ## Quick Use ```bash pip install segment-anything python -c "from segment_anything import SamPredictor, sam_model_registry; print('OK')" ``` ## Introduction Segment Anything (SAM) is a foundation model for image segmentation released by Meta AI Research. It was trained on the SA-1B dataset containing over one billion masks and can segment any object in an image without task-specific fine-tuning, making it a general-purpose building block for computer vision pipelines. ## What Segment Anything Does - Segments any object in an image given point, box, or text prompts - Generates multiple valid masks with confidence scores for ambiguous prompts - Runs zero-shot on new image domains without retraining - Produces high-quality masks at interactive speeds on GPU - Serves as a backbone for downstream tasks like video segmentation, medical imaging, and 3D reconstruction ## Architecture Overview SAM consists of three components: an image encoder (ViT-based), a flexible prompt encoder that handles points, boxes, and free-form text, and a lightweight mask decoder. The image encoder runs once per image, and the prompt encoder plus mask decoder run per query, enabling real-time interactive segmentation. The model outputs per-mask IoU scores for automatic quality ranking. ## Self-Hosting & Configuration - Install via pip and download model checkpoints (ViT-H, ViT-L, or ViT-B) from the repository - Requires PyTorch 1.7+ and a CUDA GPU for efficient inference - The SamPredictor class provides a simple API for single-image segmentation - SamAutomaticMaskGenerator generates masks for all objects in an image without prompts - Integrates with OpenCV and PIL for image I/O ## Key Features - Zero-shot generalization to unseen object types and visual domains - Interactive prompting with points, bounding boxes, or masks - Automatic mask generation mode for full-scene segmentation - Three model sizes (ViT-B, ViT-L, ViT-H) trading speed for accuracy - Permissive Apache 2.0 license for commercial use ## Comparison with Similar Tools - **SAM 2** — extends SAM to video with streaming memory; higher temporal consistency - **Grounding DINO** — open-set object detection; often paired with SAM for text-prompted segmentation - **Detectron2** — full detection/segmentation framework; requires task-specific training - **YOLO** — optimized for real-time detection; less precise per-pixel masks - **U-Net** — classic medical segmentation; needs domain-specific labeled data ## FAQ **Q: Can SAM run on CPU?** A: Yes, but inference is significantly slower. GPU is recommended for interactive use. **Q: Does SAM understand semantic categories?** A: SAM segments objects by spatial prompts, not semantic labels. Pair it with a classifier or Grounding DINO for labeled segmentation. **Q: What image formats are supported?** A: Any format readable by PIL or OpenCV, including JPEG, PNG, TIFF, and BMP. **Q: Can I fine-tune SAM on custom data?** A: The model weights can be fine-tuned using standard PyTorch training loops, though zero-shot performance is strong for most use cases. ## Sources - https://github.com/facebookresearch/segment-anything - https://segment-anything.com/ --- Source: https://tokrepo.com/en/workflows/801ea8f1-42b9-11f1-9bc6-00163e2b0d79 Author: AI Open Source