# Segment Anything (SAM) — Foundation Model for Image Segmentation

> Segment Anything Model by Meta AI provides a promptable segmentation system that can isolate any object in an image given points, boxes, or text prompts, enabling zero-shot transfer to new visual domains.

## Install

Save in your project root:

## Quick Use
```bash
pip install segment-anything
python -c "from segment_anything import SamPredictor, sam_model_registry; print('OK')"
```

## Introduction
Segment Anything (SAM) is a foundation model for image segmentation released by Meta AI Research. It was trained on the SA-1B dataset containing over one billion masks and can segment any object in an image without task-specific fine-tuning, making it a general-purpose building block for computer vision pipelines.

## What Segment Anything Does
- Segments any object in an image given point, box, or text prompts
- Generates multiple valid masks with confidence scores for ambiguous prompts
- Runs zero-shot on new image domains without retraining
- Produces high-quality masks at interactive speeds on GPU
- Serves as a backbone for downstream tasks like video segmentation, medical imaging, and 3D reconstruction

## Architecture Overview
SAM consists of three components: an image encoder (ViT-based), a flexible prompt encoder that handles points, boxes, and free-form text, and a lightweight mask decoder. The image encoder runs once per image, and the prompt encoder plus mask decoder run per query, enabling real-time interactive segmentation. The model outputs per-mask IoU scores for automatic quality ranking.

## Self-Hosting & Configuration
- Install via pip and download model checkpoints (ViT-H, ViT-L, or ViT-B) from the repository
- Requires PyTorch 1.7+ and a CUDA GPU for efficient inference
- The SamPredictor class provides a simple API for single-image segmentation
- SamAutomaticMaskGenerator generates masks for all objects in an image without prompts
- Integrates with OpenCV and PIL for image I/O

## Key Features
- Zero-shot generalization to unseen object types and visual domains
- Interactive prompting with points, bounding boxes, or masks
- Automatic mask generation mode for full-scene segmentation
- Three model sizes (ViT-B, ViT-L, ViT-H) trading speed for accuracy
- Permissive Apache 2.0 license for commercial use

## Comparison with Similar Tools
- **SAM 2** — extends SAM to video with streaming memory; higher temporal consistency
- **Grounding DINO** — open-set object detection; often paired with SAM for text-prompted segmentation
- **Detectron2** — full detection/segmentation framework; requires task-specific training
- **YOLO** — optimized for real-time detection; less precise per-pixel masks
- **U-Net** — classic medical segmentation; needs domain-specific labeled data

## FAQ
**Q: Can SAM run on CPU?**
A: Yes, but inference is significantly slower. GPU is recommended for interactive use.

**Q: Does SAM understand semantic categories?**
A: SAM segments objects by spatial prompts, not semantic labels. Pair it with a classifier or Grounding DINO for labeled segmentation.

**Q: What image formats are supported?**
A: Any format readable by PIL or OpenCV, including JPEG, PNG, TIFF, and BMP.

**Q: Can I fine-tune SAM on custom data?**
A: The model weights can be fine-tuned using standard PyTorch training loops, though zero-shot performance is strong for most use cases.

## Sources
- https://github.com/facebookresearch/segment-anything
- https://segment-anything.com/

---
Source: https://tokrepo.com/en/workflows/801ea8f1-42b9-11f1-9bc6-00163e2b0d79
Author: AI Open Source