Key Features
- Flow matching: Diffusion transformer with ConvNeXt V2 for natural speech
- Multi-speaker: Multiple voices and speaking styles
- Voice chat: Interactive voice conversation powered by Qwen2.5-3B
- Gradio UI: Web interface for inference and fine-tuning
- CLI inference: Command-line tool with custom configs
- Ultra-fast: 0.0394 RTF on L20 GPU with TensorRT-LLM
- Docker support: Containerized deployment ready
FAQ
Q: What is F5-TTS? A: F5-TTS is a diffusion transformer TTS with 14.3K+ stars using flow matching. Multi-speaker, voice chat, Gradio UI, 0.04 RTF on L20 GPU. MIT code, CC-BY-NC models.
Q: How do I install F5-TTS?
A: Run pip install f5-tts. Use f5-tts_infer-cli for command-line or f5-tts_infer-gradio for web UI.