What is AudioCraft — AI Audio Generation by Meta?

AudioCraft is a PyTorch library from Meta Research providing code and pre-trained models for audio generation including music, sound effects, and audio compression.

Is AudioCraft — AI Audio Generation by Meta free to use?

Yes. AudioCraft — AI Audio Generation by Meta is freely available on TokRepo. Check the Source & Thanks section on the asset page for the specific open-source license.

How do I install AudioCraft — AI Audio Generation by Meta?

Visit the asset page on TokRepo and click "Copy for agent" to get the installation instructions. Most assets can be installed with a single command.

AudioCraft — AI Audio Generation by Meta

Introduction

AudioCraft is a unified framework from Meta Research that brings together state-of-the-art generative audio models. It includes MusicGen for text-to-music, AudioGen for text-to-sound-effects, and EnCodec for neural audio compression, all accessible through a clean Python API.

What AudioCraft Does

Generates music from text descriptions or melody conditioning via MusicGen
Creates sound effects and ambient audio from text prompts via AudioGen
Compresses audio at very low bitrates with high quality via the EnCodec neural codec
Supports melody-conditioned generation to produce music following a given tune
Provides multiple model sizes from 300M to 3.3B parameters for different compute budgets

Architecture Overview

MusicGen and AudioGen use a single-stage autoregressive transformer that operates on tokenized audio representations from EnCodec. Unlike prior work that uses multiple stages of generation, AudioCraft introduces an efficient codebook interleaving pattern that allows a single transformer to generate all codebook streams in parallel. EnCodec is a convolutional encoder-decoder with a residual vector quantization bottleneck that compresses audio at bitrates as low as 1.5 kbps while maintaining perceptual quality.

Self-Hosting & Configuration

Install from PyPI with pip or clone the repository for development
Requires PyTorch 2.0+ and a CUDA-capable GPU for generation
Small model (300M) runs on 4 GB VRAM; large model (3.3B) needs 16 GB+
Pre-trained weights download automatically from Hugging Face on first use
Gradio demo script included for a web-based generation interface

Key Features

Text-to-music generation with controllable duration up to 30 seconds
Melody conditioning allows music generation guided by a hummed or recorded tune
EnCodec neural codec achieves high-quality compression at 1.5-24 kbps
Single-stage transformer avoids cascaded model complexity
Stereo and mono generation supported across model sizes

Comparison with Similar Tools

Stable Audio — commercial offering from Stability AI with longer outputs but closed weights
MusicLM — Google research model with strong quality but no public weights or code
Bark — generates speech, music, and effects but with less musical coherence than MusicGen
Riffusion — uses spectrograms with Stable Diffusion for music, creative but lower fidelity
AIVA — symbolic AI composer for sheet music, different paradigm from waveform generation

FAQ

Q: How long can generated audio clips be? A: MusicGen can generate clips up to 30 seconds. Longer compositions require chunked generation with overlap blending.

Q: Can I fine-tune MusicGen on my own music dataset? A: Yes, AudioCraft includes training code for fine-tuning MusicGen on custom audio data with text descriptions.

Q: What audio formats are supported? A: AudioCraft works with WAV files internally at 32 kHz. Output can be saved to any format supported by torchaudio.

Q: Does AudioCraft support real-time streaming generation? A: The current implementation generates audio offline. Real-time streaming is not natively supported but EnCodec can encode and decode in a streaming fashion.

AudioCraft — AI Audio Generation by Meta

Este activo puede ser leído e instalado directamente por agents

Introduction

What AudioCraft Does

Architecture Overview

Self-Hosting & Configuration

Key Features

Comparison with Similar Tools

FAQ

Sources

Discusión

Activos relacionados

Bark — AI Text-to-Audio with Music & Effects

LMMS — Free Cross-Platform Digital Audio Workstation

Tone.js — Web Audio Framework for Interactive Music

CogVideo — Text and Image to Video Generation