Introduction
Captum (Latin for "comprehension") is a model interpretability library for PyTorch, developed by Meta. It implements a wide range of gradient and perturbation-based attribution algorithms that explain which input features contribute most to a model's predictions. Captum works with any PyTorch model, including CNNs, RNNs, Transformers, and multi-modal architectures.
What Captum Does
- Feature attribution: identify which input features drive a prediction
- Layer attribution: understand contributions of intermediate network layers
- Neuron attribution: analyze individual neuron activations
- Robustness analysis through input perturbation metrics
- Visualization tools for image, text, and tabular attributions
Architecture Overview
Captum organizes attribution methods into three categories: primary attribution (Integrated Gradients, DeepLift, GradientSHAP, Feature Ablation), layer attribution (Layer Conductance, Layer GradCAM), and neuron attribution (Neuron Conductance). Each method implements a common Attribution interface with an attribute() method. The library integrates with Captum Insights, a web-based visualization tool, and provides utilities for convergence testing and sensitivity analysis.
Self-Hosting & Configuration
- Install via pip:
pip install captum - Requires Python 3.6+ and PyTorch
- No GPU required for attribution (runs on the same device as the model)
- Captum Insights web UI:
pip install captum[insights] - Works with any PyTorch nn.Module without modification
Key Features
- Implements 15+ attribution algorithms in a unified API
- Works with any PyTorch model out of the box
- Captum Insights provides interactive web-based visualization
- Supports multi-modal models with separate attributions per input
- Convergence delta metrics for verifying attribution quality
Comparison with Similar Tools
- LIME — model-agnostic perturbation-based; Captum provides gradient-based methods specific to PyTorch
- SHAP — Shapley values with multiple backends; Captum integrates GradientSHAP natively for PyTorch
- tf-explain — TensorFlow-specific interpretability; Captum is the PyTorch counterpart
- Alibi — framework-agnostic with counterfactual explanations; Captum focuses on attribution methods
FAQ
Q: Does Captum work with Hugging Face Transformers? A: Yes. Any model that subclasses torch.nn.Module works with Captum, including Hugging Face models.
Q: Which attribution method should I start with? A: Integrated Gradients is a good default. It satisfies sensitivity and implementation invariance axioms and works well across model types.
Q: Can Captum explain NLP models? A: Yes. Captum supports token-level attribution for text models, including visualization of token importance scores.
Q: Does attribution slow down inference? A: Attribution requires multiple forward/backward passes (depending on the method), so it is slower than a single inference pass. Use it for analysis, not production serving.