Esta página se muestra en inglés. Una traducción al español está en curso.

SkillsMar 29, 2026·2 min de lectura

Diffusers — Universal Video & Image Generation Hub

Hugging Face's diffusion model library. Run CogVideoX, AnimateDiff, Stable Video Diffusion, and 50+ video/image models with a unified API. 33,200+ stars.

Script Depot · Community

Listo para agents

Instalación lista para agent

Este activo puede instalarse después de elegir el runtime, revisar el plan y ejecutar el comando correspondiente.

Native · 98/100Política: permitir

Superficie agent

Cualquier agent MCP/CLI

Tipo

Skill

Instalación

Single

Confianza

Confianza: Established

Entrada

Diffusers — Universal Video & Image Generation Hub

Comando de instalación directa

npx -y tokrepo@latest install 4ef1950f-2a47-4e24-9ce2-6f648dea8bed --target codex

Ejecutar después de confirmar el plan con dry-run.

TL;DR

Diffusers by Hugging Face provides a unified Python API for running 50+ diffusion models for image and video generation.

§01

What it is

Diffusers is Hugging Face's Python library for running diffusion models. It provides a unified API for over 50 models including Stable Diffusion, SDXL, CogVideoX, AnimateDiff, and Stable Video Diffusion. You can generate images, edit images, create videos, and run inpainting through the same pipeline interface.

Diffusers targets AI researchers, creative developers, and product teams building generative media features who need a consistent API across rapidly evolving model architectures.

§02

How it saves time or tokens

Diffusers abstracts away the differences between model architectures behind a consistent pipeline API. Switching from Stable Diffusion to SDXL or from image to video generation requires changing a model name, not rewriting your inference code. Pre-built pipelines handle tokenization, scheduling, VAE encoding, and output formatting. The library integrates directly with Hugging Face Hub for one-line model downloads.

§03

How to use

Install Diffusers:

pip install diffusers torch

Generate an image with Stable Diffusion:

import torch
from diffusers import StableDiffusionPipeline

pipe = StableDiffusionPipeline.from_pretrained(
    'stabilityai/stable-diffusion-2-1',
    torch_dtype=torch.float16
).to('cuda')

image = pipe('A serene mountain lake at sunset').images[0]
image.save('output.png')

Generate video with CogVideoX:

from diffusers import CogVideoXPipeline
from diffusers.utils import export_to_video

pipe = CogVideoXPipeline.from_pretrained(
    'THUDM/CogVideoX-2b',
    torch_dtype=torch.float16
).to('cuda')

video = pipe('A cat playing with a ball of yarn').frames[0]
export_to_video(video, 'output.mp4')

§04

Example

Image-to-image transformation:

from diffusers import StableDiffusionImg2ImgPipeline
from PIL import Image

pipe = StableDiffusionImg2ImgPipeline.from_pretrained(
    'stabilityai/stable-diffusion-2-1',
    torch_dtype=torch.float16
).to('cuda')

init_image = Image.open('sketch.png').resize((768, 768))
result = pipe(
    prompt='A detailed oil painting',
    image=init_image,
    strength=0.75
).images[0]
result.save('painting.png')

§05

Related on TokRepo

Video tools — AI video generation and editing resources
Design tools — AI-powered visual design and image generation

§06

Common pitfalls

Most diffusion models require a GPU with at least 8GB VRAM. Use torch.float16 and enable attention slicing for lower memory usage.
Model downloads from Hugging Face Hub can be several GB. Cache models locally to avoid repeated downloads in CI/CD or serverless environments.
Video generation models are significantly slower than image models. A single CogVideoX generation can take minutes on consumer GPUs.

Preguntas frecuentes

Does Diffusers work on CPU?+

Yes, but slowly. Image generation on CPU takes minutes instead of seconds. Video generation on CPU is impractical. Use a GPU for any interactive or production workload.

Which models are included?+

Diffusers supports Stable Diffusion 1.5/2.1/XL, DALL-E-compatible models, Kandinsky, PixArt, CogVideoX, AnimateDiff, Stable Video Diffusion, ControlNet, and more. New models are added regularly.

Can I fine-tune models with Diffusers?+

Yes. Diffusers includes training scripts for LoRA, DreamBooth, and textual inversion fine-tuning. The diffusers training examples cover most common fine-tuning workflows.

How do I reduce memory usage?+

Enable attention slicing with pipe.enable_attention_slicing(), use float16 precision, and enable model CPU offloading with pipe.enable_model_cpu_offload(). These techniques can reduce VRAM usage by 50% or more.

Is commercial use allowed?+

Diffusers library is Apache 2.0 licensed. Individual model weights have their own licenses. Stable Diffusion uses an open license; other models may have restrictions. Check each model's license card on Hugging Face Hub.

Referencias (3)

Diffusers GitHub— Diffusers provides a unified API for 50+ diffusion models
Diffusers Documentation— Supports image and video generation with consistent pipeline interface
Diffusers Training— Training scripts for LoRA, DreamBooth, and textual inversion

Relacionados en TokRepo

Video tools Design tools Featured workflows

🙏

Fuente y agradecimientos

Created by Hugging Face. Licensed under Apache 2.0. diffusers — ⭐ 33,200+ Docs: huggingface.co/docs/diffusers

Discusión

Inicia sesión para unirte a la discusión.

Aún no hay comentarios. Sé el primero en compartir tus ideas.

Activos relacionados

CogVideo — Text and Image to Video Generation

An open-source video generation framework from Zhipu AI supporting text-to-video and image-to-video with CogVideoX models. Generates high-quality clips up to 6 seconds.

Skills

Script Depot

Real-ESRGAN — Practical Image and Video Super-Resolution

General-purpose image and video restoration tool that trains on pure synthetic data to handle real-world degradations including blur, noise, JPEG compression, and resize artifacts.

Skills

AI Open Source

CCXT — Universal Cryptocurrency Exchange Trading Library

A unified API for connecting to over 100 cryptocurrency exchanges in Python, JavaScript, and PHP, enabling automated trading, market data retrieval, and portfolio management across platforms.

Skills

Script Depot

PipeWire — Next-Generation Audio and Video Framework for Linux

A low-latency multimedia framework that unifies audio and video handling on Linux. It replaces both PulseAudio and JACK while maintaining full compatibility with applications built for either.

Scripts

Script Depot