Esta página se muestra en inglés. Una traducción al español está en curso.
ConfigsMay 1, 2026·3 min de lectura

Kohya sd-scripts — Training Scripts for Stable Diffusion and Flux

Comprehensive training, fine-tuning, and generation scripts for Stable Diffusion, SDXL, and Flux models. The standard toolkit for LoRA, DreamBooth, and textual inversion training.

Introduction

Kohya sd-scripts is the community-standard toolkit for training and fine-tuning Stable Diffusion, SDXL, and Flux image generation models. It provides battle-tested scripts for LoRA, DreamBooth, and textual inversion training, used by thousands of model creators to produce custom checkpoints and adapters shared across the generative AI ecosystem.

What Kohya sd-scripts Does

  • Trains LoRA, LyCORIS, and other lightweight adapters for image generation models
  • Supports full DreamBooth fine-tuning for subject-specific model customization
  • Provides textual inversion training for learning new concepts via embeddings
  • Handles SD 1.5, SD 2.x, SDXL, and Flux model architectures
  • Includes dataset preparation tools for captioning and image preprocessing

Architecture Overview

The scripts wrap Hugging Face diffusers and accelerate libraries, adding training loops optimized for image generation. Training configuration is managed through TOML files specifying model paths, learning rates, resolution, and augmentation. The training loop supports mixed precision, gradient checkpointing, and multi-GPU via DeepSpeed or FSDP, keeping VRAM requirements manageable even for SDXL and Flux models.

Self-Hosting & Configuration

  • Clone the repository and install dependencies in a Python virtual environment
  • Requires an NVIDIA GPU with at least 8 GB VRAM (12+ GB recommended for SDXL)
  • Configure training via TOML files specifying model, dataset, and hyperparameters
  • Dataset images organized in folders with optional caption text files
  • Use the bundled GUI (kohya_ss by bmaltais) for a visual training interface

Key Features

  • De facto standard for community LoRA and DreamBooth training
  • Supports all major Stable Diffusion and Flux model families
  • Advanced training techniques: noise offset, min-SNR weighting, adaptive loss
  • Bucket-based resolution handling for mixed-size training datasets
  • Extensive caption and tag preprocessing utilities

Comparison with Similar Tools

  • Axolotl — focused on LLM fine-tuning; Kohya targets image generation model training exclusively
  • diffusers training scripts — official Hugging Face examples; Kohya offers more features and community-tested defaults
  • EveryDream2 — alternative SD trainer; Kohya has broader model support and more active maintenance
  • OneTrainer — GUI-first training tool; Kohya provides deeper script-level customization
  • SimpleTuner — streamlined SDXL/Flux trainer; Kohya covers more model variants and training methods

FAQ

Q: How much VRAM do I need to train a LoRA? A: An SD 1.5 LoRA trains on 8 GB VRAM. SDXL LoRA needs 12 GB. Flux LoRA typically requires 16-24 GB depending on rank and resolution.

Q: What is the recommended dataset size for DreamBooth? A: 15-30 high-quality images of the subject with varied angles, lighting, and backgrounds. More images help but quality matters more than quantity.

Q: Can I train on AMD GPUs? A: Limited AMD support exists via ROCm, but NVIDIA CUDA remains the primary and most stable target.

Q: How do I convert the trained LoRA for use in ComfyUI or AUTOMATIC1111? A: Kohya outputs safetensors LoRA files directly compatible with both ComfyUI and AUTOMATIC1111 WebUI without conversion.

Sources

Discusión

Inicia sesión para unirte a la discusión.
Aún no hay comentarios. Sé el primero en compartir tus ideas.

Activos relacionados