Pack de Génération d'Images IA
Dix outils pour le dev ou l'artiste qui génère des images à grande échelle. Graphes ComfyUI, AUTOMATIC1111 + Fooocus pour SDXL, InvokeAI en prod, Flux + ControlNet pour le contrôle spatial, Kohya pour entraîner des LoRAs, Diffusers comme noyau Python, AnimateDiff pour l'animation, Replicate pour le batch cloud — installés dans l'ordre qui compose.
What's in this pack
This is the rig a working image-gen engineer would build over a weekend — not a Civitai bookmark dump. Every pick here is open-source, actively maintained, and earns the disk space it takes. The order matters: each tool answers a question the previous one created.
If you only generate one image a week, you don't need any of this — Midjourney is fine. This pack is for the case where you need reproducible graphs, trained character LoRAs, ControlNet pose conditioning, batches of 10k images on Replicate, or image gen called from a Claude/Codex agent over MCP. That stack is open-source-only territory in 2026.
Install in this order
- ComfyUI — the workflow engine. Start here because every later tool plugs into a ComfyUI node eventually. Graph-based, JSON-serializable workflows, 1000+ custom nodes for Flux / SDXL / ControlNet / LoRA. Once you have ComfyUI, everything else is a model file in
models/checkpoints/. - AUTOMATIC1111 (SD Web UI) — the base model UI. Lowest-friction way to test a freshly downloaded SDXL / SD 1.5 checkpoint without wiring nodes. Keep it for quick sanity checks; ComfyUI is for actual production.
- InvokeAI — production-grade canvas + queue. Where AUTOMATIC1111 is a researcher's playground, InvokeAI ships a real UI with team-friendly metadata, prompt library, and queue management. Reach for it once your output volume is real.
- Fooocus — opinionated SDXL with sane defaults. The "just give me a good image" sibling. Useful for non-engineers on your team, and as a reference for what good defaults look like.
- ControlNet — spatial conditioning. Once you can generate, you'll immediately want to condition on poses, depth, edges, segmentation. ControlNet is the answer; works inside ComfyUI / A1111 / InvokeAI / Diffusers as a model addon, not a separate app.
- Diffusers (Hugging Face) — the Python core. Everything above wraps Diffusers under the hood. When you need to script a 50k-image batch, call from a notebook, or compose pipelines (SDXL + IP-Adapter + ControlNet + Refiner), drop down to Diffusers. Don't start here — drop down here.
- Kohya sd-scripts — the LoRA training tool. The de-facto trainer for SD 1.5 / SDXL / Flux LoRAs. Once you've generated for two weeks you'll want a character / style LoRA — this is how the community trains them. Pair with a 24GB GPU or rent an A100 hour.
- AnimateDiff — motion module for diffusion. Plug into ComfyUI as a node, get 16-frame video clips out of your existing image models. The cheapest entry point into AI video without learning a new model family.
- Replicate — cloud batch when local isn't enough. When you need 10k images, or when the model is too big for your GPU, push to Replicate via API. Pay per second. Same models as local — bring your prompt JSON, get URLs back.
- mcp-image — MCP server for agents. The newest layer: expose image gen as a tool to Claude Code / Codex / Gemini CLI via MCP. Now your agent can "draw the diagram and embed it in the doc" instead of asking you to do it.
How they fit together
ComfyUI (workflow engine)
│
├─ loads checkpoints + LoRAs + ControlNet models from disk
│
└─ nodes call Diffusers (HF) under the hood
│
├─ Kohya trains the LoRAs that ComfyUI loads
│
└─ AnimateDiff is a ComfyUI node, not a separate app
AUTOMATIC1111 / Fooocus — quick base-model sanity
InvokeAI — production canvas + queue (parallel to ComfyUI)
ControlNet — model addon, lives inside ALL the above
Replicate — same model files, but run in the cloud over HTTPS
mcp-image — exposes any of the above as an MCP tool
The core combo is ComfyUI + ControlNet + Kohya + Diffusers. With those four you can generate anything, train your own style, condition on pose/depth/edge, and drop to Python when the UI runs out of road. Everything else in this pack is a specialized adapter onto that core.
Tradeoffs you'll hit
- ComfyUI vs AUTOMATIC1111 vs InvokeAI — A1111 is for quick model tests. ComfyUI is for serious workflows you'll reuse. InvokeAI is for teams that need a real queue + metadata system. Pick all three and use them for what they're good at; don't try to make one tool do all three jobs.
- Fooocus vs ComfyUI — Fooocus has the better defaults; ComfyUI has the better ceiling. Give Fooocus to your designer; keep ComfyUI for yourself.
- Train LoRA locally vs rent A100 — a 1024-resolution SDXL LoRA on Kohya is ~1.5-3 hours on a 4090,
25-45 minutes on an A100 ($1.50). Below the second LoRA per week, rent. Above, buy local. - Replicate vs run-your-own — Replicate is great for spiky workloads and models too big to run locally (Flux dev at full precision needs 24GB+). For steady throughput, your own 4090 pays back in <30 days at SDXL volumes.
- MCP image gen vs direct API — wire mcp-image up only if your agents actually need image output. Otherwise it's a moving part nobody touches.
Common pitfalls
- Disk fills up at 30GB per checkpoint — SDXL base is ~7GB, Flux dev is ~24GB, plus LoRAs (150MB each), plus ControlNet models (~1.5GB each), plus VAEs. Plan for 500GB SSD minimum if you're serious.
- CUDA / xformers version drift — every tool above wants a slightly different PyTorch + CUDA + xformers combination. Use one venv per tool and pin versions. Don't try to share a venv across ComfyUI + A1111 + InvokeAI.
- Kohya LoRA training that produces an obviously broken character — almost always a dataset issue (10 images at 768px is the floor; 30+ at 1024px is the safe zone), not a hyperparameter issue. Curate your dataset before you touch learning rate.
- ControlNet model mismatch with base — SDXL ControlNet models do NOT work on SD 1.5 base and vice versa. Mismatch = noise. Check filename suffixes (
_sdxl,_sd15) before downloading. - AnimateDiff first run produces a slideshow not motion — context length / motion scale / sampler steps need tuning together. Start with the published example workflow before improvising.
10 ressources prêtes à installer
Questions fréquentes
How much VRAM do I need to run this stack?
12GB is the floor for SDXL via ComfyUI / A1111. 16GB lets you train LoRAs on SDXL with Kohya. 24GB (4090) is the comfortable target — runs Flux dev locally, trains LoRAs in reasonable time, handles ControlNet + LoRA stacking. Below 12GB you're limited to SD 1.5 and quantized Flux variants; consider Replicate for the heavy lifting.
Why not just use Midjourney?
Midjourney is great for one-off creative shots. This pack is for the cases Midjourney can't do: training a LoRA on your specific character or product, ControlNet pose conditioning from an input image, 10k-image batch jobs with consistent metadata, integrating image gen into a Claude Code or Codex agent over MCP, or running 100% offline for sensitive inputs. If none of those apply, Midjourney is the right answer.
ComfyUI looks intimidating — should I start with AUTOMATIC1111?
Start with whichever you can install first. A1111 has a faster onboarding (text fields, click generate). ComfyUI has a steeper first hour but pays back the moment you want a workflow you can version-control, share, and re-run deterministically. If you're a dev, ComfyUI's JSON-serializable graphs will feel right within a day.
Do I need both Diffusers AND ComfyUI?
Not at first. ComfyUI wraps Diffusers, so you get Diffusers' capabilities through the node graph. Add Diffusers directly only when you need to script batches, build custom pipelines (SDXL + IP-Adapter + ControlNet + Refiner in one call), or integrate image gen into a larger Python application. For interactive work, ComfyUI alone is enough.
Is training a LoRA hard?
Mechanically no — Kohya sd-scripts has working defaults. The hard part is your dataset: 30+ varied, high-resolution images of your subject, cleanly captioned. Mechanics is a half-day learning curve; dataset curation is the actual skill. Budget a weekend for your first LoRA and expect to throw away the first two attempts.
12 packs · 80+ ressources sélectionnées
Découvrez tous les packs curatés sur la page d'accueil
Retour à tous les packs