Cette page est affichée en anglais. Une traduction française est en cours.
ScriptsMay 21, 2026·3 min de lecture

Demucs — AI-Powered Music Source Separation

Demucs is a state-of-the-art music source separation model from Meta Research that splits audio tracks into vocals, drums, bass, and other instrument stems.

Prêt pour agents

Cet actif peut être lu et installé directement par les agents

TokRepo expose une commande CLI universelle, un contrat d'installation, le metadata JSON, un plan selon l'adaptateur et le contenu raw pour aider les agents à juger l'adaptation, le risque et les prochaines actions.

Native · 98/100Policy : autoriser
Surface agent
Tout agent MCP/CLI
Type
Skill
Installation
Single
Confiance
Confiance : Established
Point d'entrée
Demucs Overview
Commande CLI universelle
npx tokrepo install d9e3e25f-54cb-11f1-9bc6-00163e2b0d79

Introduction

Demucs is a music source separation library developed at Meta Research. Its latest version, Hybrid Transformer Demucs (HTDemucs), combines temporal convolutions with a transformer architecture to separate mixed audio into individual instrument stems with high fidelity, enabling applications from karaoke creation to music production and remixing.

What Demucs Does

  • Separates music into four default stems: vocals, drums, bass, and other instruments
  • Offers a two-stem mode for quick vocal/accompaniment separation
  • Processes audio files in MP3, WAV, FLAC, and other common formats
  • Supports GPU-accelerated and CPU-only processing
  • Provides a fine-tuned 6-stem model that adds piano and guitar separation

Architecture Overview

HTDemucs combines a temporal convolutional U-Net with a transformer encoder in a hybrid architecture. The convolutional branch processes the waveform directly while a parallel spectral branch operates on STFT representations. A cross-attention transformer module fuses information between the two domains. The model is trained end-to-end with a combination of L1 loss on waveforms and multi-resolution STFT loss, using the MUSDB18-HQ dataset and additional internal training data.

Self-Hosting & Configuration

  • Install from PyPI with a single pip command
  • Works on CPU for basic use; CUDA GPU recommended for faster processing
  • Typical GPU processing speed is 5-10x faster than real-time on consumer hardware
  • Models are downloaded automatically on first use (approximately 80 MB per model)
  • Adjustable overlap and chunk size parameters trade speed for separation quality

Key Features

  • Hybrid transformer-convolution architecture achieves state-of-the-art separation quality
  • Simple CLI interface requires just one command to separate a track
  • Python API available for integration into audio processing pipelines
  • Multiple pre-trained models including the fine-tuned htdemucs_ft for best quality
  • Supports segment-based processing for long tracks with limited memory

Comparison with Similar Tools

  • Spleeter — Deezer open-source separator, faster but lower quality than Demucs
  • Open-Unmix — reference implementation for music separation, lightweight but less accurate
  • BSRNN — band-split recurrent network with competitive quality but less accessible
  • Music Source Separation (LALAL.AI) — commercial service with good quality, no local deployment
  • UVR (Ultimate Vocal Remover) — GUI tool that wraps multiple models including Demucs

FAQ

Q: How long does separation take? A: On a modern NVIDIA GPU, Demucs processes a 4-minute song in approximately 30-60 seconds. CPU processing takes 5-10 minutes for the same track.

Q: Can I separate stems other than the default four? A: The htdemucs_6s model provides 6 stems: vocals, drums, bass, guitar, piano, and other. Custom stem configurations require retraining.

Q: Does Demucs work on podcasts or speech audio? A: Demucs is optimized for music separation. For speech separation or noise removal, dedicated speech enhancement models may perform better.

Q: What audio quality does Demucs output? A: Demucs outputs at the same sample rate as the input. For best results, use high-quality source files (WAV or FLAC at 44.1 kHz or higher).

Sources

Fil de discussion

Connectez-vous pour rejoindre la discussion.
Aucun commentaire pour l'instant. Soyez le premier à partager votre avis.

Actifs similaires