Introduction
Oumi is an open-source platform that provides a unified interface for fine-tuning, evaluating, and deploying open-source language and vision-language models. Whether you are running on a single laptop GPU or a multi-node cloud cluster, Oumi handles the infrastructure complexity so you can focus on data and model quality.
What Oumi Does
- Fine-tunes LLMs and VLMs with SFT, DPO, RLHF, and other post-training methods
- Evaluates models against standard benchmarks and custom evaluation suites
- Scales training from a single GPU to multi-node clusters with one config change
- Supports Llama, Qwen, DeepSeek, Gemma, Mistral, and dozens of other model families
- Provides a CLI and Python API for programmatic control of training pipelines
Architecture Overview
Oumi is built around a configuration-driven architecture where YAML recipes define the full training pipeline: model, dataset, training method, and hardware. The trainer abstraction wraps Hugging Face Transformers and DeepSpeed for distributed training, handling gradient accumulation, mixed precision, and checkpoint management automatically. A plugin system allows custom datasets, metrics, and training objectives to be added without modifying core code.
Self-Hosting & Configuration
- Install via pip:
pip install oumiwith Python 3.10+ - Configure training recipes in YAML specifying model, data, and hyperparameters
- Use built-in recipes for popular models as starting points and customize from there
- Scale to multi-GPU with
torchrunor multi-node with DeepSpeed ZeRO Stage 3 - Deploy trained models via the built-in inference server or export to Hugging Face Hub
Key Features
- One unified framework for SFT, DPO, KTO, ORPO, and RLHF training methods
- YAML recipe system makes experiments reproducible and shareable
- Built-in evaluation suite with standard LLM benchmarks (MMLU, HellaSwag, etc.)
- Automatic mixed precision, gradient checkpointing, and LoRA/QLoRA support
- First-class vision-language model support for multimodal fine-tuning
Comparison with Similar Tools
- LLaMA-Factory — Similar scope with a web UI; Oumi emphasizes CLI-first and programmatic workflows
- Axolotl — Config-driven fine-tuning; Oumi adds integrated evaluation and deployment
- Unsloth — Optimized for speed on single GPUs; Oumi scales from single GPU to multi-node clusters
- torchtune — PyTorch-native training; Oumi wraps multiple backends and adds evaluation
- PEFT — Library for parameter-efficient methods; Oumi integrates PEFT as one of many training options
FAQ
Q: Which models can I fine-tune with Oumi? A: Oumi supports most Hugging Face transformer models including Llama, Qwen, DeepSeek, Gemma, Mistral, Phi, and vision-language variants.
Q: Can I use Oumi on a single consumer GPU? A: Yes, Oumi supports QLoRA and gradient checkpointing to fine-tune large models on GPUs with limited VRAM.
Q: How does Oumi compare to LLaMA-Factory? A: Both handle LLM fine-tuning. Oumi focuses on CLI-driven workflows and integrated evaluation, while LLaMA-Factory offers a web UI for interactive experimentation.
Q: Does Oumi support RLHF training? A: Yes, Oumi supports DPO, KTO, ORPO, and reward model training as part of its post-training recipe collection.