Esta página se muestra en inglés. Una traducción al español está en curso.
ScriptsMay 21, 2026·3 min de lectura

AudioCraft — AI Audio Generation by Meta

AudioCraft is a PyTorch library from Meta Research providing code and pre-trained models for audio generation including music, sound effects, and audio compression.

Listo para agents

Este activo puede ser leído e instalado directamente por agents

TokRepo expone un comando CLI universal, contrato de instalación, metadata JSON, plan según adaptador y contenido raw para que los agents evalúen compatibilidad, riesgo y próximos pasos.

Native · 98/100Política: permitir
Superficie agent
Cualquier agent MCP/CLI
Tipo
Skill
Instalación
Single
Confianza
Confianza: Established
Entrada
AudioCraft Overview
Comando CLI universal
npx tokrepo install 8a0d7a57-54cb-11f1-9bc6-00163e2b0d79

Introduction

AudioCraft is a unified framework from Meta Research that brings together state-of-the-art generative audio models. It includes MusicGen for text-to-music, AudioGen for text-to-sound-effects, and EnCodec for neural audio compression, all accessible through a clean Python API.

What AudioCraft Does

  • Generates music from text descriptions or melody conditioning via MusicGen
  • Creates sound effects and ambient audio from text prompts via AudioGen
  • Compresses audio at very low bitrates with high quality via the EnCodec neural codec
  • Supports melody-conditioned generation to produce music following a given tune
  • Provides multiple model sizes from 300M to 3.3B parameters for different compute budgets

Architecture Overview

MusicGen and AudioGen use a single-stage autoregressive transformer that operates on tokenized audio representations from EnCodec. Unlike prior work that uses multiple stages of generation, AudioCraft introduces an efficient codebook interleaving pattern that allows a single transformer to generate all codebook streams in parallel. EnCodec is a convolutional encoder-decoder with a residual vector quantization bottleneck that compresses audio at bitrates as low as 1.5 kbps while maintaining perceptual quality.

Self-Hosting & Configuration

  • Install from PyPI with pip or clone the repository for development
  • Requires PyTorch 2.0+ and a CUDA-capable GPU for generation
  • Small model (300M) runs on 4 GB VRAM; large model (3.3B) needs 16 GB+
  • Pre-trained weights download automatically from Hugging Face on first use
  • Gradio demo script included for a web-based generation interface

Key Features

  • Text-to-music generation with controllable duration up to 30 seconds
  • Melody conditioning allows music generation guided by a hummed or recorded tune
  • EnCodec neural codec achieves high-quality compression at 1.5-24 kbps
  • Single-stage transformer avoids cascaded model complexity
  • Stereo and mono generation supported across model sizes

Comparison with Similar Tools

  • Stable Audio — commercial offering from Stability AI with longer outputs but closed weights
  • MusicLM — Google research model with strong quality but no public weights or code
  • Bark — generates speech, music, and effects but with less musical coherence than MusicGen
  • Riffusion — uses spectrograms with Stable Diffusion for music, creative but lower fidelity
  • AIVA — symbolic AI composer for sheet music, different paradigm from waveform generation

FAQ

Q: How long can generated audio clips be? A: MusicGen can generate clips up to 30 seconds. Longer compositions require chunked generation with overlap blending.

Q: Can I fine-tune MusicGen on my own music dataset? A: Yes, AudioCraft includes training code for fine-tuning MusicGen on custom audio data with text descriptions.

Q: What audio formats are supported? A: AudioCraft works with WAV files internally at 32 kHz. Output can be saved to any format supported by torchaudio.

Q: Does AudioCraft support real-time streaming generation? A: The current implementation generates audio offline. Real-time streaming is not natively supported but EnCodec can encode and decode in a streaming fashion.

Sources

Discusión

Inicia sesión para unirte a la discusión.
Aún no hay comentarios. Sé el primero en compartir tus ideas.

Activos relacionados