Introduction
Kohya sd-scripts is the community-standard toolkit for training and fine-tuning Stable Diffusion, SDXL, and Flux image generation models. It provides battle-tested scripts for LoRA, DreamBooth, and textual inversion training, used by thousands of model creators to produce custom checkpoints and adapters shared across the generative AI ecosystem.
What Kohya sd-scripts Does
- Trains LoRA, LyCORIS, and other lightweight adapters for image generation models
- Supports full DreamBooth fine-tuning for subject-specific model customization
- Provides textual inversion training for learning new concepts via embeddings
- Handles SD 1.5, SD 2.x, SDXL, and Flux model architectures
- Includes dataset preparation tools for captioning and image preprocessing
Architecture Overview
The scripts wrap Hugging Face diffusers and accelerate libraries, adding training loops optimized for image generation. Training configuration is managed through TOML files specifying model paths, learning rates, resolution, and augmentation. The training loop supports mixed precision, gradient checkpointing, and multi-GPU via DeepSpeed or FSDP, keeping VRAM requirements manageable even for SDXL and Flux models.
Self-Hosting & Configuration
- Clone the repository and install dependencies in a Python virtual environment
- Requires an NVIDIA GPU with at least 8 GB VRAM (12+ GB recommended for SDXL)
- Configure training via TOML files specifying model, dataset, and hyperparameters
- Dataset images organized in folders with optional caption text files
- Use the bundled GUI (kohya_ss by bmaltais) for a visual training interface
Key Features
- De facto standard for community LoRA and DreamBooth training
- Supports all major Stable Diffusion and Flux model families
- Advanced training techniques: noise offset, min-SNR weighting, adaptive loss
- Bucket-based resolution handling for mixed-size training datasets
- Extensive caption and tag preprocessing utilities
Comparison with Similar Tools
- Axolotl — focused on LLM fine-tuning; Kohya targets image generation model training exclusively
- diffusers training scripts — official Hugging Face examples; Kohya offers more features and community-tested defaults
- EveryDream2 — alternative SD trainer; Kohya has broader model support and more active maintenance
- OneTrainer — GUI-first training tool; Kohya provides deeper script-level customization
- SimpleTuner — streamlined SDXL/Flux trainer; Kohya covers more model variants and training methods
FAQ
Q: How much VRAM do I need to train a LoRA? A: An SD 1.5 LoRA trains on 8 GB VRAM. SDXL LoRA needs 12 GB. Flux LoRA typically requires 16-24 GB depending on rank and resolution.
Q: What is the recommended dataset size for DreamBooth? A: 15-30 high-quality images of the subject with varied angles, lighting, and backgrounds. More images help but quality matters more than quantity.
Q: Can I train on AMD GPUs? A: Limited AMD support exists via ROCm, but NVIDIA CUDA remains the primary and most stable target.
Q: How do I convert the trained LoRA for use in ComfyUI or AUTOMATIC1111? A: Kohya outputs safetensors LoRA files directly compatible with both ComfyUI and AUTOMATIC1111 WebUI without conversion.