ScriptsMay 21, 2026·3 min read

Demucs — AI-Powered Music Source Separation

Demucs is a state-of-the-art music source separation model from Meta Research that splits audio tracks into vocals, drums, bass, and other instrument stems.

Agent ready

This asset can be read and installed directly by agents

TokRepo exposes a universal CLI command, install contract, metadata JSON, adapter-aware plan, and raw content links so agents can judge fit, risk, and next actions.

Native · 98/100Policy: allow
Agent surface
Any MCP/CLI agent
Kind
Skill
Install
Single
Trust
Trust: Established
Entrypoint
Demucs Overview
Universal CLI install command
npx tokrepo install d9e3e25f-54cb-11f1-9bc6-00163e2b0d79

Introduction

Demucs is a music source separation library developed at Meta Research. Its latest version, Hybrid Transformer Demucs (HTDemucs), combines temporal convolutions with a transformer architecture to separate mixed audio into individual instrument stems with high fidelity, enabling applications from karaoke creation to music production and remixing.

What Demucs Does

  • Separates music into four default stems: vocals, drums, bass, and other instruments
  • Offers a two-stem mode for quick vocal/accompaniment separation
  • Processes audio files in MP3, WAV, FLAC, and other common formats
  • Supports GPU-accelerated and CPU-only processing
  • Provides a fine-tuned 6-stem model that adds piano and guitar separation

Architecture Overview

HTDemucs combines a temporal convolutional U-Net with a transformer encoder in a hybrid architecture. The convolutional branch processes the waveform directly while a parallel spectral branch operates on STFT representations. A cross-attention transformer module fuses information between the two domains. The model is trained end-to-end with a combination of L1 loss on waveforms and multi-resolution STFT loss, using the MUSDB18-HQ dataset and additional internal training data.

Self-Hosting & Configuration

  • Install from PyPI with a single pip command
  • Works on CPU for basic use; CUDA GPU recommended for faster processing
  • Typical GPU processing speed is 5-10x faster than real-time on consumer hardware
  • Models are downloaded automatically on first use (approximately 80 MB per model)
  • Adjustable overlap and chunk size parameters trade speed for separation quality

Key Features

  • Hybrid transformer-convolution architecture achieves state-of-the-art separation quality
  • Simple CLI interface requires just one command to separate a track
  • Python API available for integration into audio processing pipelines
  • Multiple pre-trained models including the fine-tuned htdemucs_ft for best quality
  • Supports segment-based processing for long tracks with limited memory

Comparison with Similar Tools

  • Spleeter — Deezer open-source separator, faster but lower quality than Demucs
  • Open-Unmix — reference implementation for music separation, lightweight but less accurate
  • BSRNN — band-split recurrent network with competitive quality but less accessible
  • Music Source Separation (LALAL.AI) — commercial service with good quality, no local deployment
  • UVR (Ultimate Vocal Remover) — GUI tool that wraps multiple models including Demucs

FAQ

Q: How long does separation take? A: On a modern NVIDIA GPU, Demucs processes a 4-minute song in approximately 30-60 seconds. CPU processing takes 5-10 minutes for the same track.

Q: Can I separate stems other than the default four? A: The htdemucs_6s model provides 6 stems: vocals, drums, bass, guitar, piano, and other. Custom stem configurations require retraining.

Q: Does Demucs work on podcasts or speech audio? A: Demucs is optimized for music separation. For speech separation or noise removal, dedicated speech enhancement models may perform better.

Q: What audio quality does Demucs output? A: Demucs outputs at the same sample rate as the input. For best results, use high-quality source files (WAV or FLAC at 44.1 kHz or higher).

Sources

Discussion

Sign in to join the discussion.
No comments yet. Be the first to share your thoughts.

Related Assets