Esta página se muestra en inglés. Una traducción al español está en curso.
ScriptsMay 21, 2026·3 min de lectura

Demucs — AI-Powered Music Source Separation

Demucs is a state-of-the-art music source separation model from Meta Research that splits audio tracks into vocals, drums, bass, and other instrument stems.

Listo para agents

Este activo puede ser leído e instalado directamente por agents

TokRepo expone un comando CLI universal, contrato de instalación, metadata JSON, plan según adaptador y contenido raw para que los agents evalúen compatibilidad, riesgo y próximos pasos.

Native · 98/100Política: permitir
Superficie agent
Cualquier agent MCP/CLI
Tipo
Skill
Instalación
Single
Confianza
Confianza: Established
Entrada
Demucs Overview
Comando CLI universal
npx tokrepo install d9e3e25f-54cb-11f1-9bc6-00163e2b0d79

Introduction

Demucs is a music source separation library developed at Meta Research. Its latest version, Hybrid Transformer Demucs (HTDemucs), combines temporal convolutions with a transformer architecture to separate mixed audio into individual instrument stems with high fidelity, enabling applications from karaoke creation to music production and remixing.

What Demucs Does

  • Separates music into four default stems: vocals, drums, bass, and other instruments
  • Offers a two-stem mode for quick vocal/accompaniment separation
  • Processes audio files in MP3, WAV, FLAC, and other common formats
  • Supports GPU-accelerated and CPU-only processing
  • Provides a fine-tuned 6-stem model that adds piano and guitar separation

Architecture Overview

HTDemucs combines a temporal convolutional U-Net with a transformer encoder in a hybrid architecture. The convolutional branch processes the waveform directly while a parallel spectral branch operates on STFT representations. A cross-attention transformer module fuses information between the two domains. The model is trained end-to-end with a combination of L1 loss on waveforms and multi-resolution STFT loss, using the MUSDB18-HQ dataset and additional internal training data.

Self-Hosting & Configuration

  • Install from PyPI with a single pip command
  • Works on CPU for basic use; CUDA GPU recommended for faster processing
  • Typical GPU processing speed is 5-10x faster than real-time on consumer hardware
  • Models are downloaded automatically on first use (approximately 80 MB per model)
  • Adjustable overlap and chunk size parameters trade speed for separation quality

Key Features

  • Hybrid transformer-convolution architecture achieves state-of-the-art separation quality
  • Simple CLI interface requires just one command to separate a track
  • Python API available for integration into audio processing pipelines
  • Multiple pre-trained models including the fine-tuned htdemucs_ft for best quality
  • Supports segment-based processing for long tracks with limited memory

Comparison with Similar Tools

  • Spleeter — Deezer open-source separator, faster but lower quality than Demucs
  • Open-Unmix — reference implementation for music separation, lightweight but less accurate
  • BSRNN — band-split recurrent network with competitive quality but less accessible
  • Music Source Separation (LALAL.AI) — commercial service with good quality, no local deployment
  • UVR (Ultimate Vocal Remover) — GUI tool that wraps multiple models including Demucs

FAQ

Q: How long does separation take? A: On a modern NVIDIA GPU, Demucs processes a 4-minute song in approximately 30-60 seconds. CPU processing takes 5-10 minutes for the same track.

Q: Can I separate stems other than the default four? A: The htdemucs_6s model provides 6 stems: vocals, drums, bass, guitar, piano, and other. Custom stem configurations require retraining.

Q: Does Demucs work on podcasts or speech audio? A: Demucs is optimized for music separation. For speech separation or noise removal, dedicated speech enhancement models may perform better.

Q: What audio quality does Demucs output? A: Demucs outputs at the same sample rate as the input. For best results, use high-quality source files (WAV or FLAC at 44.1 kHz or higher).

Sources

Discusión

Inicia sesión para unirte a la discusión.
Aún no hay comentarios. Sé el primero en compartir tus ideas.

Activos relacionados