How do I install Segment Anything (SAM) — Zero-Shot Image Segmentation by Meta?

Visit the asset page on TokRepo and click "Copy for agent" to get the installation instructions. Most assets can be installed with a single command.

Segment Anything (SAM) — Zero-Shot Image Segmentation by Meta

Introduction

Segment Anything Model (SAM) is Meta AI's promptable segmentation foundation model. Given an image and a prompt such as a point, bounding box, or text, SAM produces high-quality object masks without needing task-specific fine-tuning.

What SAM Does

Segments any object given point, box, or mask prompts
Generates multiple valid masks when prompts are ambiguous
Runs in real-time on GPU for interactive annotation workflows
Exports to ONNX for deployment in browsers and edge devices
Provides the SA-1B dataset with over 1 billion masks on 11 million images

Architecture Overview

SAM has three components: a ViT-based image encoder that produces image embeddings once per image, a flexible prompt encoder that handles points, boxes, masks, and text, and a lightweight mask decoder that combines both to predict segmentation masks. This design allows the heavy image encoding to be amortized across multiple prompts.

Self-Hosting & Configuration

Requires Python 3.8+ and PyTorch 1.7+
Three checkpoint sizes available: ViT-B (375 MB), ViT-L (1.2 GB), ViT-H (2.4 GB)
ONNX export enables CPU-only or browser-based deployment
GPU with 8 GB VRAM is sufficient for real-time single-image inference
Can be used as a library or through the included demo notebooks

Key Features

Zero-shot generalization to unseen object categories and domains
Trained on SA-1B, one of the largest segmentation datasets ever created
Interactive point-and-click interface for fast manual annotation
Multiple mask output with confidence scores for ambiguous prompts
ONNX runtime support for lightweight deployment without PyTorch

Comparison with Similar Tools

SAM 2 — Meta's successor with video segmentation support; SAM focuses on single images
Detectron2 — Meta's detection framework; requires task-specific training unlike SAM
YOLO — excels at real-time detection and segmentation with fixed categories; SAM handles open-vocabulary
U-Net — classical encoder-decoder for segmentation; needs domain-specific labels and training

FAQ

Q: Can SAM segment video? A: SAM operates on single images. For video segmentation, use SAM 2.

Q: Does SAM work without a GPU? A: Yes, the ONNX model runs on CPU, though inference is slower.

Q: How accurate is SAM on domain-specific data like medical imaging? A: SAM generalizes well but may need fine-tuning for specialized domains where visual patterns differ significantly from natural images.

Q: Is the SA-1B dataset available? A: Yes, Meta released SA-1B under a research license for academic and non-commercial use.

Segment Anything (SAM) — Zero-Shot Image Segmentation by Meta

Introduction

What SAM Does

Architecture Overview

Self-Hosting & Configuration

Key Features

Comparison with Similar Tools

FAQ

Sources

Discussion

Related Assets

Text Embeddings Inference — High-Performance Embedding Server by Hugging Face

GPT-NeoX — Open-Source Large Language Model Training Library

SAM 2 — Segment Anything in Images and Videos