# Voicebox — Open-Source AI Voice Studio > An open-source AI voice studio for voice cloning, text-to-speech dictation, and audio creation running locally with GPU acceleration on macOS and Linux. ## Install Save as a script file and run: # Voicebox — Open-Source AI Voice Studio ## Quick Use ```bash git clone https://github.com/jamiepine/voicebox.git cd voicebox npm install npm run dev ``` ## Introduction Voicebox is an open-source AI voice studio that provides voice cloning, text-to-speech synthesis, and dictation capabilities in a polished desktop-quality interface. It runs locally using GPU acceleration and supports multiple TTS backends, giving creators full control over voice generation without cloud dependencies. ## What Voicebox Does - Clones voices from short audio samples for personalized TTS - Synthesizes speech from text with adjustable speed, pitch, and emotion - Provides a dictation mode for voice-to-text transcription - Supports multiple TTS model backends including Qwen3-TTS and Whisper - Runs entirely locally with CUDA or MLX acceleration ## Architecture Overview Voicebox is a TypeScript application with an Electron or web-based frontend and a local Python inference backend. The frontend provides an audio workstation-style interface for managing voice profiles, editing text, and monitoring generation. The backend orchestrates model loading, inference, and audio post-processing through a WebSocket connection, supporting hot-swapping between different TTS engines. ## Self-Hosting & Configuration - Clone the repository and install Node.js and Python dependencies - Install CUDA toolkit for NVIDIA GPUs or use MLX on Apple Silicon - Download voice model checkpoints via the built-in model manager - Configure default voice profiles and output format in settings - Optionally run headless as an API server for integration with other tools ## Key Features - Voice cloning from audio samples as short as 10 seconds - Multiple TTS backends with one-click switching - Real-time waveform preview and audio editing - Batch text-to-speech for processing scripts and documents - Local-first architecture with no data leaving your machine ## Comparison with Similar Tools - **ElevenLabs** — cloud-based voice API; Voicebox is fully local and open-source - **Bark** — generates speech with effects; Voicebox provides a full studio interface - **Kokoro** — lightweight TTS model; Voicebox wraps multiple backends in a rich UI - **F5-TTS** — flow-matching synthesis; Voicebox integrates it as one of several engines ## FAQ **Q: What GPU is required?** A: An NVIDIA GPU with 6+ GB VRAM or Apple Silicon Mac with MLX support is recommended. **Q: How long does voice cloning take?** A: Cloning a voice profile from a 10-second sample typically completes in under a minute. **Q: Can I use cloned voices commercially?** A: The software is open-source, but you are responsible for ensuring you have consent and legal rights for any voice you clone. **Q: Does it support real-time synthesis?** A: Yes, streaming synthesis is available for interactive applications. ## Sources - https://github.com/jamiepine/voicebox --- Source: https://tokrepo.com/en/workflows/asset-74dca6e7 Author: Script Depot