Cette page est affichée en anglais. Une traduction française est en cours.
ConfigsMay 21, 2026·3 min de lecture

ControlNet — Add Spatial Control to Diffusion Models

ControlNet lets you add precise spatial conditioning such as edge maps, depth, and pose to Stable Diffusion, giving fine-grained control over AI image generation.

Prêt pour agents

Cet actif peut être lu et installé directement par les agents

TokRepo expose une commande CLI universelle, un contrat d'installation, le metadata JSON, un plan selon l'adaptateur et le contenu raw pour aider les agents à juger l'adaptation, le risque et les prochaines actions.

Native · 98/100Policy : autoriser
Surface agent
Tout agent MCP/CLI
Type
Skill
Installation
Single
Confiance
Confiance : Established
Point d'entrée
ControlNet Overview
Commande CLI universelle
npx tokrepo install 74fc6ef5-54cb-11f1-9bc6-00163e2b0d79

Introduction

ControlNet is a neural network architecture that adds trainable conditional control to large pretrained diffusion models. Developed by Lvmin Zhang and Maneesh Agrawala at Stanford, it enables precise spatial guidance for image generation using inputs like Canny edges, depth maps, human pose, segmentation maps, and more.

What ControlNet Does

  • Adds spatial conditioning to Stable Diffusion without retraining the base model
  • Supports 14+ conditioning types including Canny edge, depth, normal map, and pose
  • Preserves the quality and diversity of the original diffusion model
  • Enables composition control through multi-ControlNet pipelines
  • Works with both SD 1.5 and SDXL model families

Architecture Overview

ControlNet creates a trainable copy of the encoding layers of a pretrained diffusion model and connects it to the locked original via zero-convolution layers. During training, only the copy and zero-conv layers are updated, leaving the original model frozen. This design ensures that harmful noise cannot flow back into the pretrained weights while the network learns to interpret the conditioning input. The zero-convolution layers start with zero weights, so training begins from the pretrained model's behavior.

Self-Hosting & Configuration

  • Install via pip with diffusers or clone the original repo for standalone use
  • Requires a GPU with at least 8 GB VRAM for inference at 512x512 resolution
  • Pre-trained control models available on Hugging Face for each conditioning type
  • Combine with LoRA adapters and custom Stable Diffusion checkpoints
  • Batch processing supported for generating multiple controlled images

Key Features

  • Zero-convolution architecture preserves pretrained model quality during fine-tuning
  • Multi-ControlNet allows combining multiple conditions in a single generation
  • Preprocessor suite includes Canny, HED, MLSD, OpenPose, Midas depth, and more
  • Integrates natively with Hugging Face Diffusers, AUTOMATIC1111, and ComfyUI
  • Training scripts provided for creating custom ControlNet models on new conditions

Comparison with Similar Tools

  • IP-Adapter — controls style and content via image prompts rather than spatial maps
  • T2I-Adapter — lighter-weight alternative with faster inference but less precise control
  • Uni-ControlNet — unifies multiple conditions into a single model but fewer community weights
  • GLIGEN — grounded generation with bounding boxes rather than pixel-level spatial maps
  • InstantID — specialized for identity-preserving face generation, narrower scope

FAQ

Q: How much VRAM does ControlNet need? A: A single ControlNet with SD 1.5 needs about 8 GB VRAM. Multi-ControlNet or SDXL setups benefit from 12 GB or more.

Q: Can I train a custom ControlNet on my own condition type? A: Yes, the repository includes training scripts. You need paired data of your condition input and target images, typically 50K-200K pairs for good results.

Q: Does ControlNet work with SDXL? A: Yes, community-trained and official SDXL ControlNet models are available on Hugging Face.

Q: Can I use multiple ControlNets simultaneously? A: Yes, Diffusers and ComfyUI both support multi-ControlNet pipelines where each ControlNet handles a different conditioning signal with adjustable strength.

Sources

Fil de discussion

Connectez-vous pour rejoindre la discussion.
Aucun commentaire pour l'instant. Soyez le premier à partager votre avis.

Actifs similaires