Cette page est affichée en anglais. Une traduction française est en cours.
ScriptsMay 21, 2026·3 min de lecture

AudioCraft — AI Audio Generation by Meta

AudioCraft is a PyTorch library from Meta Research providing code and pre-trained models for audio generation including music, sound effects, and audio compression.

Prêt pour agents

Cet actif peut être lu et installé directement par les agents

TokRepo expose une commande CLI universelle, un contrat d'installation, le metadata JSON, un plan selon l'adaptateur et le contenu raw pour aider les agents à juger l'adaptation, le risque et les prochaines actions.

Native · 98/100Policy : autoriser
Surface agent
Tout agent MCP/CLI
Type
Skill
Installation
Single
Confiance
Confiance : Established
Point d'entrée
AudioCraft Overview
Commande CLI universelle
npx tokrepo install 8a0d7a57-54cb-11f1-9bc6-00163e2b0d79

Introduction

AudioCraft is a unified framework from Meta Research that brings together state-of-the-art generative audio models. It includes MusicGen for text-to-music, AudioGen for text-to-sound-effects, and EnCodec for neural audio compression, all accessible through a clean Python API.

What AudioCraft Does

  • Generates music from text descriptions or melody conditioning via MusicGen
  • Creates sound effects and ambient audio from text prompts via AudioGen
  • Compresses audio at very low bitrates with high quality via the EnCodec neural codec
  • Supports melody-conditioned generation to produce music following a given tune
  • Provides multiple model sizes from 300M to 3.3B parameters for different compute budgets

Architecture Overview

MusicGen and AudioGen use a single-stage autoregressive transformer that operates on tokenized audio representations from EnCodec. Unlike prior work that uses multiple stages of generation, AudioCraft introduces an efficient codebook interleaving pattern that allows a single transformer to generate all codebook streams in parallel. EnCodec is a convolutional encoder-decoder with a residual vector quantization bottleneck that compresses audio at bitrates as low as 1.5 kbps while maintaining perceptual quality.

Self-Hosting & Configuration

  • Install from PyPI with pip or clone the repository for development
  • Requires PyTorch 2.0+ and a CUDA-capable GPU for generation
  • Small model (300M) runs on 4 GB VRAM; large model (3.3B) needs 16 GB+
  • Pre-trained weights download automatically from Hugging Face on first use
  • Gradio demo script included for a web-based generation interface

Key Features

  • Text-to-music generation with controllable duration up to 30 seconds
  • Melody conditioning allows music generation guided by a hummed or recorded tune
  • EnCodec neural codec achieves high-quality compression at 1.5-24 kbps
  • Single-stage transformer avoids cascaded model complexity
  • Stereo and mono generation supported across model sizes

Comparison with Similar Tools

  • Stable Audio — commercial offering from Stability AI with longer outputs but closed weights
  • MusicLM — Google research model with strong quality but no public weights or code
  • Bark — generates speech, music, and effects but with less musical coherence than MusicGen
  • Riffusion — uses spectrograms with Stable Diffusion for music, creative but lower fidelity
  • AIVA — symbolic AI composer for sheet music, different paradigm from waveform generation

FAQ

Q: How long can generated audio clips be? A: MusicGen can generate clips up to 30 seconds. Longer compositions require chunked generation with overlap blending.

Q: Can I fine-tune MusicGen on my own music dataset? A: Yes, AudioCraft includes training code for fine-tuning MusicGen on custom audio data with text descriptions.

Q: What audio formats are supported? A: AudioCraft works with WAV files internally at 32 kHz. Output can be saved to any format supported by torchaudio.

Q: Does AudioCraft support real-time streaming generation? A: The current implementation generates audio offline. Real-time streaming is not natively supported but EnCodec can encode and decode in a streaming fashion.

Sources

Fil de discussion

Connectez-vous pour rejoindre la discussion.
Aucun commentaire pour l'instant. Soyez le premier à partager votre avis.

Actifs similaires