Introduction
CodeFormer is a face restoration algorithm developed at NTU Singapore that recovers high-quality facial details from heavily degraded images. It combines a VQGAN codebook with a transformer module to produce natural-looking restorations even from very low-quality, blurred, or compressed face images.
What CodeFormer Does
- Restores faces degraded by blur, noise, compression, and low resolution
- Provides a fidelity-quality tradeoff parameter (w) for user control
- Handles both cropped face inputs and full photos with face detection
- Supports old photo restoration and face color enhancement
- Processes face inpainting for occluded or damaged regions
Architecture Overview
CodeFormer uses a two-stage approach. First, a VQGAN encoder maps input faces to a learned discrete codebook of high-quality facial features. Then, a transformer predicts the optimal code sequence for the degraded input, leveraging global face composition understanding. The controllable feature transformation module blends encoder features with decoded codebook features using the fidelity weight w, giving users a smooth tradeoff between faithfulness to the input and generation quality.
Self-Hosting & Configuration
- Requires Python 3.8+, PyTorch 1.7+, and CUDA for GPU acceleration
- Pre-trained model weights download via the provided script or manual links
- The fidelity weight w (0 to 1) controls restoration strength: lower values produce sharper but less faithful results
- Supports batch processing of multiple images via folder input
- Integrates with Real-ESRGAN for background upscaling alongside face restoration
Key Features
- Discrete codebook prior captures high-quality facial feature patterns
- Transformer-based code prediction provides global face understanding
- Adjustable fidelity-quality balance via a single scalar parameter
- Handles extreme degradation where other methods fail
- Built-in face detection and alignment for full-photo restoration
Comparison with Similar Tools
- GFPGAN — GAN-based face restoration, faster but less robust on severely degraded inputs
- Real-ESRGAN — general-purpose image super-resolution without face-specific priors
- DFDNet — dictionary-based face restoration with component-level detail transfer
- VQFR — also uses vector quantization but with a different decoder architecture
- RestoreFormer — transformer-based restoration with similar goals but different codebook design
FAQ
Q: What does the fidelity weight w control? A: Setting w closer to 1 produces results more faithful to the input face. Setting w closer to 0 generates sharper, higher-quality faces that may deviate slightly from the original identity.
Q: Can CodeFormer handle full photos, not just cropped faces? A: Yes, the inference script includes face detection and alignment. It restores each detected face and pastes it back into the original image.
Q: Does CodeFormer work on video? A: The repository focuses on image restoration. For video, process frames individually and reassemble, though temporal consistency is not guaranteed.
Q: What image sizes work best? A: CodeFormer internally processes faces at 512x512 resolution. Input images of any size are supported via automatic face cropping and re-integration.