Key Features
- 80+ languages: Broad multilingual coverage across global language families
- 4B Dual-AR model: Natural, realistic speech with sub-word prosody control
- 15,000+ emotion tags: Fine-grained emotional control via natural language
- Voice cloning: Clone any voice from 10-30 seconds of reference audio
- Multi-speaker dialogue: Generate conversations with multiple speakers
- Real-time inference: 0.195 RTF on H200 GPU, WebUI and API server included
FAQ
Q: What is Fish Speech? A: Fish Speech is an open-source TTS system with 29K+ stars supporting 80+ languages. It uses a 4B parameter Dual-AR model with voice cloning, emotional control via 15K+ tags, and real-time inference.
Q: How do I install Fish Speech?
A: Run pip install fish-speech or use Docker: docker pull fishaudio/fish-speech. Requires a GPU for inference.