# Demucs — AI-Powered Music Source Separation > Demucs is a state-of-the-art music source separation model from Meta Research that splits audio tracks into vocals, drums, bass, and other instrument stems. ## Install Save as a script file and run: # Demucs — AI-Powered Music Source Separation ## Quick Use ```bash pip install demucs # Separate a song into stems demucs --two-stems=vocals song.mp3 # Or separate into 4 stems (vocals, drums, bass, other) demucs song.mp3 # Output is saved to ./separated/htdemucs/song/ ``` ## Introduction Demucs is a music source separation library developed at Meta Research. Its latest version, Hybrid Transformer Demucs (HTDemucs), combines temporal convolutions with a transformer architecture to separate mixed audio into individual instrument stems with high fidelity, enabling applications from karaoke creation to music production and remixing. ## What Demucs Does - Separates music into four default stems: vocals, drums, bass, and other instruments - Offers a two-stem mode for quick vocal/accompaniment separation - Processes audio files in MP3, WAV, FLAC, and other common formats - Supports GPU-accelerated and CPU-only processing - Provides a fine-tuned 6-stem model that adds piano and guitar separation ## Architecture Overview HTDemucs combines a temporal convolutional U-Net with a transformer encoder in a hybrid architecture. The convolutional branch processes the waveform directly while a parallel spectral branch operates on STFT representations. A cross-attention transformer module fuses information between the two domains. The model is trained end-to-end with a combination of L1 loss on waveforms and multi-resolution STFT loss, using the MUSDB18-HQ dataset and additional internal training data. ## Self-Hosting & Configuration - Install from PyPI with a single pip command - Works on CPU for basic use; CUDA GPU recommended for faster processing - Typical GPU processing speed is 5-10x faster than real-time on consumer hardware - Models are downloaded automatically on first use (approximately 80 MB per model) - Adjustable overlap and chunk size parameters trade speed for separation quality ## Key Features - Hybrid transformer-convolution architecture achieves state-of-the-art separation quality - Simple CLI interface requires just one command to separate a track - Python API available for integration into audio processing pipelines - Multiple pre-trained models including the fine-tuned htdemucs_ft for best quality - Supports segment-based processing for long tracks with limited memory ## Comparison with Similar Tools - **Spleeter** — Deezer open-source separator, faster but lower quality than Demucs - **Open-Unmix** — reference implementation for music separation, lightweight but less accurate - **BSRNN** — band-split recurrent network with competitive quality but less accessible - **Music Source Separation (LALAL.AI)** — commercial service with good quality, no local deployment - **UVR (Ultimate Vocal Remover)** — GUI tool that wraps multiple models including Demucs ## FAQ **Q: How long does separation take?** A: On a modern NVIDIA GPU, Demucs processes a 4-minute song in approximately 30-60 seconds. CPU processing takes 5-10 minutes for the same track. **Q: Can I separate stems other than the default four?** A: The htdemucs_6s model provides 6 stems: vocals, drums, bass, guitar, piano, and other. Custom stem configurations require retraining. **Q: Does Demucs work on podcasts or speech audio?** A: Demucs is optimized for music separation. For speech separation or noise removal, dedicated speech enhancement models may perform better. **Q: What audio quality does Demucs output?** A: Demucs outputs at the same sample rate as the input. For best results, use high-quality source files (WAV or FLAC at 44.1 kHz or higher). ## Sources - https://github.com/facebookresearch/demucs - https://arxiv.org/abs/2211.08553 --- Source: https://tokrepo.com/en/workflows/asset-d9e3e25f Author: Script Depot