# Demucs — AI-Powered Music Source Separation

> Demucs is a state-of-the-art music source separation model from Meta Research that splits audio tracks into vocals, drums, bass, and other instrument stems.

## Install

Save as a script file and run:

# Demucs — AI-Powered Music Source Separation

## Quick Use
```bash
pip install demucs
# Separate a song into stems
demucs --two-stems=vocals song.mp3
# Or separate into 4 stems (vocals, drums, bass, other)
demucs song.mp3
# Output is saved to ./separated/htdemucs/song/
```

## Introduction
Demucs is a music source separation library developed at Meta Research. Its latest version, Hybrid Transformer Demucs (HTDemucs), combines temporal convolutions with a transformer architecture to separate mixed audio into individual instrument stems with high fidelity, enabling applications from karaoke creation to music production and remixing.

## What Demucs Does
- Separates music into four default stems: vocals, drums, bass, and other instruments
- Offers a two-stem mode for quick vocal/accompaniment separation
- Processes audio files in MP3, WAV, FLAC, and other common formats
- Supports GPU-accelerated and CPU-only processing
- Provides a fine-tuned 6-stem model that adds piano and guitar separation

## Architecture Overview
HTDemucs combines a temporal convolutional U-Net with a transformer encoder in a hybrid architecture. The convolutional branch processes the waveform directly while a parallel spectral branch operates on STFT representations. A cross-attention transformer module fuses information between the two domains. The model is trained end-to-end with a combination of L1 loss on waveforms and multi-resolution STFT loss, using the MUSDB18-HQ dataset and additional internal training data.

## Self-Hosting & Configuration
- Install from PyPI with a single pip command
- Works on CPU for basic use; CUDA GPU recommended for faster processing
- Typical GPU processing speed is 5-10x faster than real-time on consumer hardware
- Models are downloaded automatically on first use (approximately 80 MB per model)
- Adjustable overlap and chunk size parameters trade speed for separation quality

## Key Features
- Hybrid transformer-convolution architecture achieves state-of-the-art separation quality
- Simple CLI interface requires just one command to separate a track
- Python API available for integration into audio processing pipelines
- Multiple pre-trained models including the fine-tuned htdemucs_ft for best quality
- Supports segment-based processing for long tracks with limited memory

## Comparison with Similar Tools
- **Spleeter** — Deezer open-source separator, faster but lower quality than Demucs
- **Open-Unmix** — reference implementation for music separation, lightweight but less accurate
- **BSRNN** — band-split recurrent network with competitive quality but less accessible
- **Music Source Separation (LALAL.AI)** — commercial service with good quality, no local deployment
- **UVR (Ultimate Vocal Remover)** — GUI tool that wraps multiple models including Demucs

## FAQ
**Q: How long does separation take?**
A: On a modern NVIDIA GPU, Demucs processes a 4-minute song in approximately 30-60 seconds. CPU processing takes 5-10 minutes for the same track.

**Q: Can I separate stems other than the default four?**
A: The htdemucs_6s model provides 6 stems: vocals, drums, bass, guitar, piano, and other. Custom stem configurations require retraining.

**Q: Does Demucs work on podcasts or speech audio?**
A: Demucs is optimized for music separation. For speech separation or noise removal, dedicated speech enhancement models may perform better.

**Q: What audio quality does Demucs output?**
A: Demucs outputs at the same sample rate as the input. For best results, use high-quality source files (WAV or FLAC at 44.1 kHz or higher).

## Sources
- https://github.com/facebookresearch/demucs
- https://arxiv.org/abs/2211.08553

---
Source: https://tokrepo.com/en/workflows/asset-d9e3e25f
Author: Script Depot