# AudioCraft — AI Audio Generation by Meta

> AudioCraft is a PyTorch library from Meta Research providing code and pre-trained models for audio generation including music, sound effects, and audio compression.

## Install

Save as a script file and run:

# AudioCraft — AI Audio Generation by Meta

## Quick Use
```bash
pip install audiocraft
python -c "
from audiocraft.models import MusicGen
model = MusicGen.get_pretrained('small')
model.set_generation_params(duration=8)
wav = model.generate(['upbeat electronic dance track with synth leads'])
import torchaudio
torchaudio.save('output.wav', wav[0].cpu(), 32000)
"
```

## Introduction
AudioCraft is a unified framework from Meta Research that brings together state-of-the-art generative audio models. It includes MusicGen for text-to-music, AudioGen for text-to-sound-effects, and EnCodec for neural audio compression, all accessible through a clean Python API.

## What AudioCraft Does
- Generates music from text descriptions or melody conditioning via MusicGen
- Creates sound effects and ambient audio from text prompts via AudioGen
- Compresses audio at very low bitrates with high quality via the EnCodec neural codec
- Supports melody-conditioned generation to produce music following a given tune
- Provides multiple model sizes from 300M to 3.3B parameters for different compute budgets

## Architecture Overview
MusicGen and AudioGen use a single-stage autoregressive transformer that operates on tokenized audio representations from EnCodec. Unlike prior work that uses multiple stages of generation, AudioCraft introduces an efficient codebook interleaving pattern that allows a single transformer to generate all codebook streams in parallel. EnCodec is a convolutional encoder-decoder with a residual vector quantization bottleneck that compresses audio at bitrates as low as 1.5 kbps while maintaining perceptual quality.

## Self-Hosting & Configuration
- Install from PyPI with pip or clone the repository for development
- Requires PyTorch 2.0+ and a CUDA-capable GPU for generation
- Small model (300M) runs on 4 GB VRAM; large model (3.3B) needs 16 GB+
- Pre-trained weights download automatically from Hugging Face on first use
- Gradio demo script included for a web-based generation interface

## Key Features
- Text-to-music generation with controllable duration up to 30 seconds
- Melody conditioning allows music generation guided by a hummed or recorded tune
- EnCodec neural codec achieves high-quality compression at 1.5-24 kbps
- Single-stage transformer avoids cascaded model complexity
- Stereo and mono generation supported across model sizes

## Comparison with Similar Tools
- **Stable Audio** — commercial offering from Stability AI with longer outputs but closed weights
- **MusicLM** — Google research model with strong quality but no public weights or code
- **Bark** — generates speech, music, and effects but with less musical coherence than MusicGen
- **Riffusion** — uses spectrograms with Stable Diffusion for music, creative but lower fidelity
- **AIVA** — symbolic AI composer for sheet music, different paradigm from waveform generation

## FAQ
**Q: How long can generated audio clips be?**
A: MusicGen can generate clips up to 30 seconds. Longer compositions require chunked generation with overlap blending.

**Q: Can I fine-tune MusicGen on my own music dataset?**
A: Yes, AudioCraft includes training code for fine-tuning MusicGen on custom audio data with text descriptions.

**Q: What audio formats are supported?**
A: AudioCraft works with WAV files internally at 32 kHz. Output can be saved to any format supported by torchaudio.

**Q: Does AudioCraft support real-time streaming generation?**
A: The current implementation generates audio offline. Real-time streaming is not natively supported but EnCodec can encode and decode in a streaming fashion.

## Sources
- https://github.com/facebookresearch/audiocraft
- https://ai.meta.com/resources/models-and-libraries/audiocraft/

---
Source: https://tokrepo.com/en/workflows/asset-8a0d7a57
Author: Script Depot