# AudioCraft — AI Audio Generation by Meta > AudioCraft is a PyTorch library from Meta Research providing code and pre-trained models for audio generation including music, sound effects, and audio compression. ## Install Save as a script file and run: # AudioCraft — AI Audio Generation by Meta ## Quick Use ```bash pip install audiocraft python -c " from audiocraft.models import MusicGen model = MusicGen.get_pretrained('small') model.set_generation_params(duration=8) wav = model.generate(['upbeat electronic dance track with synth leads']) import torchaudio torchaudio.save('output.wav', wav[0].cpu(), 32000) " ``` ## Introduction AudioCraft is a unified framework from Meta Research that brings together state-of-the-art generative audio models. It includes MusicGen for text-to-music, AudioGen for text-to-sound-effects, and EnCodec for neural audio compression, all accessible through a clean Python API. ## What AudioCraft Does - Generates music from text descriptions or melody conditioning via MusicGen - Creates sound effects and ambient audio from text prompts via AudioGen - Compresses audio at very low bitrates with high quality via the EnCodec neural codec - Supports melody-conditioned generation to produce music following a given tune - Provides multiple model sizes from 300M to 3.3B parameters for different compute budgets ## Architecture Overview MusicGen and AudioGen use a single-stage autoregressive transformer that operates on tokenized audio representations from EnCodec. Unlike prior work that uses multiple stages of generation, AudioCraft introduces an efficient codebook interleaving pattern that allows a single transformer to generate all codebook streams in parallel. EnCodec is a convolutional encoder-decoder with a residual vector quantization bottleneck that compresses audio at bitrates as low as 1.5 kbps while maintaining perceptual quality. ## Self-Hosting & Configuration - Install from PyPI with pip or clone the repository for development - Requires PyTorch 2.0+ and a CUDA-capable GPU for generation - Small model (300M) runs on 4 GB VRAM; large model (3.3B) needs 16 GB+ - Pre-trained weights download automatically from Hugging Face on first use - Gradio demo script included for a web-based generation interface ## Key Features - Text-to-music generation with controllable duration up to 30 seconds - Melody conditioning allows music generation guided by a hummed or recorded tune - EnCodec neural codec achieves high-quality compression at 1.5-24 kbps - Single-stage transformer avoids cascaded model complexity - Stereo and mono generation supported across model sizes ## Comparison with Similar Tools - **Stable Audio** — commercial offering from Stability AI with longer outputs but closed weights - **MusicLM** — Google research model with strong quality but no public weights or code - **Bark** — generates speech, music, and effects but with less musical coherence than MusicGen - **Riffusion** — uses spectrograms with Stable Diffusion for music, creative but lower fidelity - **AIVA** — symbolic AI composer for sheet music, different paradigm from waveform generation ## FAQ **Q: How long can generated audio clips be?** A: MusicGen can generate clips up to 30 seconds. Longer compositions require chunked generation with overlap blending. **Q: Can I fine-tune MusicGen on my own music dataset?** A: Yes, AudioCraft includes training code for fine-tuning MusicGen on custom audio data with text descriptions. **Q: What audio formats are supported?** A: AudioCraft works with WAV files internally at 32 kHz. Output can be saved to any format supported by torchaudio. **Q: Does AudioCraft support real-time streaming generation?** A: The current implementation generates audio offline. Real-time streaming is not natively supported but EnCodec can encode and decode in a streaming fashion. ## Sources - https://github.com/facebookresearch/audiocraft - https://ai.meta.com/resources/models-and-libraries/audiocraft/ --- Source: https://tokrepo.com/en/workflows/asset-8a0d7a57 Author: Script Depot