Introduction
Composer is an open-source PyTorch training library by MosaicML (now Databricks) that makes it easy to apply efficiency methods like mixed precision, gradient accumulation, and algorithm-level speedups. It provides a Trainer abstraction that handles distributed training, checkpointing, and logging out of the box.
What Composer Does
- Provides a Trainer class that wraps PyTorch training with built-in best practices
- Implements 25+ speed-up algorithms like BlurPool, CutMix, Label Smoothing, and Progressive Resizing
- Supports multi-GPU and multi-node training with FSDP and DeepSpeed backends
- Handles checkpointing, resumption, and logging to W&B, MLflow, or TensorBoard
- Includes streaming dataset loading for training on cloud-hosted data
Architecture Overview
Composer's Trainer manages the training loop through an event-based callback system. Speed-up algorithms are implemented as callbacks that hook into events like BATCH_START or AFTER_LOSS. The Trainer orchestrates data loading, forward/backward passes, optimization, and checkpointing. Under the hood, it delegates distributed parallelism to PyTorch FSDP or DeepSpeed, abstracting away the complexity of multi-GPU coordination.
Self-Hosting & Configuration
- Install:
pip install mosaicmlor with extras:pip install 'mosaicml[all]' - Define training runs via YAML configs or Python API
- Set distributed training:
composer -n 8 train.yamlfor 8-GPU runs - Configure cloud checkpointing to S3 or GCS for fault tolerance
- Use Streaming datasets for efficient data loading from object storage
Key Features
- Composable speed-up algorithms that stack without code changes
- YAML-based declarative training configuration
- Built-in FSDP support for training large language models
- Elastic fault-tolerant training with automatic checkpoint recovery
- LLM fine-tuning recipes for MPT and other foundation models
Comparison with Similar Tools
- PyTorch Lightning — general training framework; Composer focuses on efficiency algorithms and LLM training
- DeepSpeed — low-level distributed training library; Composer provides a higher-level Trainer interface
- Hugging Face Trainer — specialized for transformers; Composer supports any PyTorch model architecture
- Determined AI — platform with resource management; Composer is a pure training library
FAQ
Q: Can I use Composer with any PyTorch model? A: Yes. Wrap your model and data loaders in Composer's Trainer. No model architecture changes needed.
Q: What kind of speedups can I expect? A: Combining algorithms like MixUp, Progressive Resizing, and mixed precision typically yields 2-5x training speedup with no accuracy loss.
Q: Does Composer support RLHF or fine-tuning workflows? A: Yes. The LLM Foundry project built on Composer provides recipes for pre-training and fine-tuning large language models.
Q: Is multi-node training supported? A: Yes. Composer uses PyTorch's distributed launcher and supports multi-node FSDP and DeepSpeed configurations.