Kohya sd-scripts — Training Scripts for Stable Diffusion and Flux

Introduction

Kohya sd-scripts is the community-standard toolkit for training and fine-tuning Stable Diffusion, SDXL, and Flux image generation models. It provides battle-tested scripts for LoRA, DreamBooth, and textual inversion training, used by thousands of model creators to produce custom checkpoints and adapters shared across the generative AI ecosystem.

What Kohya sd-scripts Does

Trains LoRA, LyCORIS, and other lightweight adapters for image generation models
Supports full DreamBooth fine-tuning for subject-specific model customization
Provides textual inversion training for learning new concepts via embeddings
Handles SD 1.5, SD 2.x, SDXL, and Flux model architectures
Includes dataset preparation tools for captioning and image preprocessing

Architecture Overview

The scripts wrap Hugging Face diffusers and accelerate libraries, adding training loops optimized for image generation. Training configuration is managed through TOML files specifying model paths, learning rates, resolution, and augmentation. The training loop supports mixed precision, gradient checkpointing, and multi-GPU via DeepSpeed or FSDP, keeping VRAM requirements manageable even for SDXL and Flux models.

Self-Hosting & Configuration

Clone the repository and install dependencies in a Python virtual environment
Requires an NVIDIA GPU with at least 8 GB VRAM (12+ GB recommended for SDXL)
Configure training via TOML files specifying model, dataset, and hyperparameters
Dataset images organized in folders with optional caption text files
Use the bundled GUI (kohya_ss by bmaltais) for a visual training interface

Key Features

De facto standard for community LoRA and DreamBooth training
Supports all major Stable Diffusion and Flux model families
Advanced training techniques: noise offset, min-SNR weighting, adaptive loss
Bucket-based resolution handling for mixed-size training datasets
Extensive caption and tag preprocessing utilities

Comparison with Similar Tools

Axolotl — focused on LLM fine-tuning; Kohya targets image generation model training exclusively
diffusers training scripts — official Hugging Face examples; Kohya offers more features and community-tested defaults
EveryDream2 — alternative SD trainer; Kohya has broader model support and more active maintenance
OneTrainer — GUI-first training tool; Kohya provides deeper script-level customization
SimpleTuner — streamlined SDXL/Flux trainer; Kohya covers more model variants and training methods

FAQ

Q: How much VRAM do I need to train a LoRA? A: An SD 1.5 LoRA trains on 8 GB VRAM. SDXL LoRA needs 12 GB. Flux LoRA typically requires 16-24 GB depending on rank and resolution.

Q: What is the recommended dataset size for DreamBooth? A: 15-30 high-quality images of the subject with varied angles, lighting, and backgrounds. More images help but quality matters more than quantity.

Q: Can I train on AMD GPUs? A: Limited AMD support exists via ROCm, but NVIDIA CUDA remains the primary and most stable target.

Q: How do I convert the trained LoRA for use in ComfyUI or AUTOMATIC1111? A: Kohya outputs safetensors LoRA files directly compatible with both ComfyUI and AUTOMATIC1111 WebUI without conversion.

Kohya sd-scripts — Training Scripts for Stable Diffusion and Flux

Introduction

What Kohya sd-scripts Does

Architecture Overview

Self-Hosting & Configuration

Key Features

Comparison with Similar Tools

FAQ

Sources

Discusión

Activos relacionados

AnimateDiff — Plug-and-Play Animation for Diffusion Models

Petals — Run LLMs at Home BitTorrent-Style

Tortoise TTS — Multi-Voice Text-to-Speech Focused on Quality