TOKREPO · Arsenal de IA

Estable

Pack de Generación de Imágenes con IA

Diez selecciones para el dev o artista que genera imágenes a escala. Grafos ComfyUI, AUTOMATIC1111 + Fooocus para SDXL, InvokeAI en producción, Flux + ControlNet para control espacial, Kohya para entrenar LoRAs, Diffusers como núcleo Python, AnimateDiff para animación, Replicate para batch en la nube — instalados en orden compuesto.

10 recursos

Sobre este pack

What's in this pack

This is the rig a working image-gen engineer would build over a weekend — not a Civitai bookmark dump. Every pick here is open-source, actively maintained, and earns the disk space it takes. The order matters: each tool answers a question the previous one created.

If you only generate one image a week, you don't need any of this — Midjourney is fine. This pack is for the case where you need reproducible graphs, trained character LoRAs, ControlNet pose conditioning, batches of 10k images on Replicate, or image gen called from a Claude/Codex agent over MCP. That stack is open-source-only territory in 2026.

Install in this order

ComfyUI — the workflow engine. Start here because every later tool plugs into a ComfyUI node eventually. Graph-based, JSON-serializable workflows, 1000+ custom nodes for Flux / SDXL / ControlNet / LoRA. Once you have ComfyUI, everything else is a model file in models/checkpoints/.
AUTOMATIC1111 (SD Web UI) — the base model UI. Lowest-friction way to test a freshly downloaded SDXL / SD 1.5 checkpoint without wiring nodes. Keep it for quick sanity checks; ComfyUI is for actual production.
InvokeAI — production-grade canvas + queue. Where AUTOMATIC1111 is a researcher's playground, InvokeAI ships a real UI with team-friendly metadata, prompt library, and queue management. Reach for it once your output volume is real.
Fooocus — opinionated SDXL with sane defaults. The "just give me a good image" sibling. Useful for non-engineers on your team, and as a reference for what good defaults look like.
ControlNet — spatial conditioning. Once you can generate, you'll immediately want to condition on poses, depth, edges, segmentation. ControlNet is the answer; works inside ComfyUI / A1111 / InvokeAI / Diffusers as a model addon, not a separate app.
Diffusers (Hugging Face) — the Python core. Everything above wraps Diffusers under the hood. When you need to script a 50k-image batch, call from a notebook, or compose pipelines (SDXL + IP-Adapter + ControlNet + Refiner), drop down to Diffusers. Don't start here — drop down here.
Kohya sd-scripts — the LoRA training tool. The de-facto trainer for SD 1.5 / SDXL / Flux LoRAs. Once you've generated for two weeks you'll want a character / style LoRA — this is how the community trains them. Pair with a 24GB GPU or rent an A100 hour.
AnimateDiff — motion module for diffusion. Plug into ComfyUI as a node, get 16-frame video clips out of your existing image models. The cheapest entry point into AI video without learning a new model family.
Replicate — cloud batch when local isn't enough. When you need 10k images, or when the model is too big for your GPU, push to Replicate via API. Pay per second. Same models as local — bring your prompt JSON, get URLs back.
mcp-image — MCP server for agents. The newest layer: expose image gen as a tool to Claude Code / Codex / Gemini CLI via MCP. Now your agent can "draw the diagram and embed it in the doc" instead of asking you to do it.

How they fit together

ComfyUI (workflow engine)
   │
   ├─ loads checkpoints + LoRAs + ControlNet models from disk
   │
   └─ nodes call Diffusers (HF) under the hood
         │
         ├─ Kohya trains the LoRAs that ComfyUI loads
         │
         └─ AnimateDiff is a ComfyUI node, not a separate app

AUTOMATIC1111 / Fooocus — quick base-model sanity
InvokeAI — production canvas + queue (parallel to ComfyUI)

ControlNet — model addon, lives inside ALL the above

Replicate — same model files, but run in the cloud over HTTPS

mcp-image — exposes any of the above as an MCP tool

The core combo is ComfyUI + ControlNet + Kohya + Diffusers. With those four you can generate anything, train your own style, condition on pose/depth/edge, and drop to Python when the UI runs out of road. Everything else in this pack is a specialized adapter onto that core.

Tradeoffs you'll hit

ComfyUI vs AUTOMATIC1111 vs InvokeAI — A1111 is for quick model tests. ComfyUI is for serious workflows you'll reuse. InvokeAI is for teams that need a real queue + metadata system. Pick all three and use them for what they're good at; don't try to make one tool do all three jobs.
Fooocus vs ComfyUI — Fooocus has the better defaults; ComfyUI has the better ceiling. Give Fooocus to your designer; keep ComfyUI for yourself.
Train LoRA locally vs rent A100 — a 1024-resolution SDXL LoRA on Kohya is ~1.5-3 hours on a 4090, ~~25-45 minutes on an A100 (~~$1.50). Below the second LoRA per week, rent. Above, buy local.
Replicate vs run-your-own — Replicate is great for spiky workloads and models too big to run locally (Flux dev at full precision needs 24GB+). For steady throughput, your own 4090 pays back in <30 days at SDXL volumes.
MCP image gen vs direct API — wire mcp-image up only if your agents actually need image output. Otherwise it's a moving part nobody touches.

Common pitfalls

Disk fills up at 30GB per checkpoint — SDXL base is ~7GB, Flux dev is ~24GB, plus LoRAs (150MB each), plus ControlNet models (~1.5GB each), plus VAEs. Plan for 500GB SSD minimum if you're serious.
CUDA / xformers version drift — every tool above wants a slightly different PyTorch + CUDA + xformers combination. Use one venv per tool and pin versions. Don't try to share a venv across ComfyUI + A1111 + InvokeAI.
Kohya LoRA training that produces an obviously broken character — almost always a dataset issue (10 images at 768px is the floor; 30+ at 1024px is the safe zone), not a hyperparameter issue. Curate your dataset before you touch learning rate.
ControlNet model mismatch with base — SDXL ControlNet models do NOT work on SD 1.5 base and vice versa. Mismatch = noise. Check filename suffixes (_sdxl, _sd15) before downloading.
AnimateDiff first run produces a slideshow not motion — context length / motion scale / sampler steps need tuning together. Start with the published example workflow before improvising.

INSTALAR · UN COMANDO

$ tokrepo install pack/ai-image-generation-pack

pásalo a tu agente — o pégalo en tu terminal

Qué incluye

10 recursos listos para instalar

Skill#01

ComfyUI — Node-Based AI Image Generation

The most powerful modular AI image generation GUI with a node/graph editor. Supports Stable Diffusion, Flux, SDXL, ControlNet, and 1000+ custom nodes. 107K+ stars.

by AI Open Source·407 views

$ tokrepo install comfyui-node-based-ai-image-generation-02888d06

Skill#02

Stable Diffusion Web UI by AUTOMATIC1111 — The Definitive Local AI Image Generator

AUTOMATIC1111's Stable Diffusion Web UI is the most popular interface for running Stable Diffusion locally. It supports text-to-image, image-to-image, inpainting, ControlNet, LoRA, embeddings, extensions, and every model variant — all in a self-hosted browser UI.

by Script Depot·304 views

$ tokrepo install stable-diffusion-web-ui-automatic1111-definitive-local-ai-b0727fbf

Skill#03

InvokeAI — Professional Creative Engine for Stable Diffusion

A leading open-source creative engine for Stable Diffusion and Flux models with a polished WebUI, node-based workflows, and production-grade image generation.

by Script Depot·255 views

$ tokrepo install invokeai-professional-creative-engine-stable-diffusion-4d4c2b85

Prompt#04

Fooocus — Focus on Prompting and Generating, Not the Tooling

Fooocus is a Stable Diffusion image generator that strips away every dial and toggle. Just type a prompt and get magazine-quality results — opinionated defaults, automatic prompt engineering, and SDXL-grade output with one click.

by Script Depot·205 views

$ tokrepo install fooocus-focus-prompting-generating-not-tooling-b0b1b970

Skill#05

ControlNet — Add Spatial Control to Diffusion Models

ControlNet lets you add precise spatial conditioning such as edge maps, depth, and pose to Stable Diffusion, giving fine-grained control over AI image generation.

by AI Open Source·132 views

$ tokrepo install controlnet-add-spatial-control-diffusion-models-74fc6ef5

Skill#06

Diffusers — Universal Video & Image Generation Hub

Hugging Face's diffusion model library. Run CogVideoX, AnimateDiff, Stable Video Diffusion, and 50+ video/image models with a unified API. 33,200+ stars.

by Script Depot·372 views

$ tokrepo install diffusers-universal-video-image-generation-hub-4ef1950f

Skill#07

Kohya sd-scripts — Training Scripts for Stable Diffusion and Flux

Comprehensive training, fine-tuning, and generation scripts for Stable Diffusion, SDXL, and Flux models. The standard toolkit for LoRA, DreamBooth, and textual inversion training.

by AI Open Source·239 views

$ tokrepo install kohya-sd-scripts-training-scripts-stable-diffusion-flux-cd2c15cb

Skill#08

AnimateDiff — Plug-and-Play Animation for Diffusion Models

A plug-and-play motion module that turns community text-to-image Stable Diffusion models into animation generators without additional training. ICLR 2024 Spotlight paper.

by AI Open Source·220 views

$ tokrepo install animatediff-plug-play-animation-diffusion-models-04d7fee0

Skill#09

Replicate — Run AI Models via Simple API Calls

Cloud platform to run open-source AI models with a simple API. Replicate hosts Llama, Stable Diffusion, Whisper, and thousands of models — no GPU setup or Docker required.

by Replicate·313 views

$ tokrepo install replicate-run-ai-models-via-simple-api-calls-e80aca76

MCP#10

mcp-image — MCP Image Generation & Editing Server

mcp-image is an MCP server for image generation/editing with quality presets; verified 110★ and documents `npx -y mcp-image` configs for Cursor and Claude.

by MCP Hub·160 views

$ tokrepo install mcp-image-mcp-image-generation-editing-server

Preguntas frecuentes

How much VRAM do I need to run this stack?

12GB is the floor for SDXL via ComfyUI / A1111. 16GB lets you train LoRAs on SDXL with Kohya. 24GB (4090) is the comfortable target — runs Flux dev locally, trains LoRAs in reasonable time, handles ControlNet + LoRA stacking. Below 12GB you're limited to SD 1.5 and quantized Flux variants; consider Replicate for the heavy lifting.

Why not just use Midjourney?

Midjourney is great for one-off creative shots. This pack is for the cases Midjourney can't do: training a LoRA on your specific character or product, ControlNet pose conditioning from an input image, 10k-image batch jobs with consistent metadata, integrating image gen into a Claude Code or Codex agent over MCP, or running 100% offline for sensitive inputs. If none of those apply, Midjourney is the right answer.

ComfyUI looks intimidating — should I start with AUTOMATIC1111?

Start with whichever you can install first. A1111 has a faster onboarding (text fields, click generate). ComfyUI has a steeper first hour but pays back the moment you want a workflow you can version-control, share, and re-run deterministically. If you're a dev, ComfyUI's JSON-serializable graphs will feel right within a day.

Do I need both Diffusers AND ComfyUI?

Not at first. ComfyUI wraps Diffusers, so you get Diffusers' capabilities through the node graph. Add Diffusers directly only when you need to script batches, build custom pipelines (SDXL + IP-Adapter + ControlNet + Refiner in one call), or integrate image gen into a larger Python application. For interactive work, ComfyUI alone is enough.

Is training a LoRA hard?

Mechanically no — Kohya sd-scripts has working defaults. The hard part is your dataset: 30+ varied, high-resolution images of your subject, cleanly captioned. Mechanics is a half-day learning curve; dataset curation is the actual skill. Budget a weekend for your first LoRA and expect to throw away the first two attempts.

MÁS DEL ARSENAL

12 packs · 80+ recursos seleccionados

Explora todos los packs curados en la página principal

Volver a todos los packs