What is Demucs — AI-Powered Music Source Separation?

Demucs is a state-of-the-art music source separation model from Meta Research that splits audio tracks into vocals, drums, bass, and other instrument stems.

Is Demucs — AI-Powered Music Source Separation free to use?

Yes. Demucs — AI-Powered Music Source Separation is freely available on TokRepo. Check the Source & Thanks section on the asset page for the specific open-source license.

How do I install Demucs — AI-Powered Music Source Separation?

Visit the asset page on TokRepo and click "Copy for agent" to get the installation instructions. Most assets can be installed with a single command.

Demucs — AI-Powered Music Source Separation

Introduction

Demucs is a music source separation library developed at Meta Research. Its latest version, Hybrid Transformer Demucs (HTDemucs), combines temporal convolutions with a transformer architecture to separate mixed audio into individual instrument stems with high fidelity, enabling applications from karaoke creation to music production and remixing.

What Demucs Does

Separates music into four default stems: vocals, drums, bass, and other instruments
Offers a two-stem mode for quick vocal/accompaniment separation
Processes audio files in MP3, WAV, FLAC, and other common formats
Supports GPU-accelerated and CPU-only processing
Provides a fine-tuned 6-stem model that adds piano and guitar separation

Architecture Overview

HTDemucs combines a temporal convolutional U-Net with a transformer encoder in a hybrid architecture. The convolutional branch processes the waveform directly while a parallel spectral branch operates on STFT representations. A cross-attention transformer module fuses information between the two domains. The model is trained end-to-end with a combination of L1 loss on waveforms and multi-resolution STFT loss, using the MUSDB18-HQ dataset and additional internal training data.

Self-Hosting & Configuration

Install from PyPI with a single pip command
Works on CPU for basic use; CUDA GPU recommended for faster processing
Typical GPU processing speed is 5-10x faster than real-time on consumer hardware
Models are downloaded automatically on first use (approximately 80 MB per model)
Adjustable overlap and chunk size parameters trade speed for separation quality

Key Features

Hybrid transformer-convolution architecture achieves state-of-the-art separation quality
Simple CLI interface requires just one command to separate a track
Python API available for integration into audio processing pipelines
Multiple pre-trained models including the fine-tuned htdemucs_ft for best quality
Supports segment-based processing for long tracks with limited memory

Comparison with Similar Tools

Spleeter — Deezer open-source separator, faster but lower quality than Demucs
Open-Unmix — reference implementation for music separation, lightweight but less accurate
BSRNN — band-split recurrent network with competitive quality but less accessible
Music Source Separation (LALAL.AI) — commercial service with good quality, no local deployment
UVR (Ultimate Vocal Remover) — GUI tool that wraps multiple models including Demucs

FAQ

Q: How long does separation take? A: On a modern NVIDIA GPU, Demucs processes a 4-minute song in approximately 30-60 seconds. CPU processing takes 5-10 minutes for the same track.

Q: Can I separate stems other than the default four? A: The htdemucs_6s model provides 6 stems: vocals, drums, bass, guitar, piano, and other. Custom stem configurations require retraining.

Q: Does Demucs work on podcasts or speech audio? A: Demucs is optimized for music separation. For speech separation or noise removal, dedicated speech enhancement models may perform better.

Q: What audio quality does Demucs output? A: Demucs outputs at the same sample rate as the input. For best results, use high-quality source files (WAV or FLAC at 44.1 kHz or higher).

Demucs — AI-Powered Music Source Separation

This asset can be read and installed directly by agents

Introduction

What Demucs Does

Architecture Overview

Self-Hosting & Configuration

Key Features

Comparison with Similar Tools

FAQ

Sources

Discussion

Related Assets

Ampache — Self-Hosted Music Streaming Server

Chatterbox — State-of-the-Art Open Source Text-to-Speech

Reactive Resume — AI-Powered Open-Source Resume Builder

Bark — AI Text-to-Audio with Music & Effects