Scripts2026年4月1日·1 分钟阅读

F5-TTS — Flow Matching Text-to-Speech

F5-TTS is a diffusion transformer TTS system with flow matching. 14.3K+ GitHub stars. Multi-speaker, voice chat, Gradio UI, CLI inference, 0.04 RTF on L20 GPU. MIT code.

TO
TokRepo精选 · Community
快速使用

先拿来用,再决定要不要深挖

这里应该同时让用户和 Agent 知道第一步该复制什么、安装什么、落到哪里。

# Install
pip install f5-tts

# CLI inference
f5-tts_infer-cli --model F5TTS_v1_Base --ref_audio ref.wav --ref_text "Reference text" --gen_text "Text to generate"

# Or launch Gradio web UI
f5-tts_infer-gradio

# Voice chat with Qwen2.5
f5-tts_infer-gradio --voicechat

介绍

F5-TTS is a diffusion transformer-based text-to-speech system using flow matching with ConvNeXt V2 architecture, optimized for fast training and inference. With 14,300+ GitHub stars, F5-TTS delivers multi-speaker and multi-style speech synthesis, voice chat powered by Qwen2.5-3B-Instruct, a Gradio web interface for inference and fine-tuning, and CLI inference. With Triton/TensorRT-LLM optimization, it achieves 0.0394 real-time factor on L20 GPU. MIT licensed code with CC-BY-NC pre-trained models.

Best for: Researchers and developers needing high-quality multi-speaker TTS with voice cloning Works with: Claude Code, OpenAI Codex, Cursor, Gemini CLI, Windsurf Optimized: 0.04 RTF on L20 GPU with TensorRT-LLM


Key Features

  • Flow matching: Diffusion transformer with ConvNeXt V2 for natural speech
  • Multi-speaker: Multiple voices and speaking styles
  • Voice chat: Interactive voice conversation powered by Qwen2.5-3B
  • Gradio UI: Web interface for inference and fine-tuning
  • CLI inference: Command-line tool with custom configs
  • Ultra-fast: 0.0394 RTF on L20 GPU with TensorRT-LLM
  • Docker support: Containerized deployment ready

FAQ

Q: What is F5-TTS? A: F5-TTS is a diffusion transformer TTS with 14.3K+ stars using flow matching. Multi-speaker, voice chat, Gradio UI, 0.04 RTF on L20 GPU. MIT code, CC-BY-NC models.

Q: How do I install F5-TTS? A: Run pip install f5-tts. Use f5-tts_infer-cli for command-line or f5-tts_infer-gradio for web UI.


🙏

来源与感谢

Created by SWivid. Code: MIT, Models: CC-BY-NC. SWivid/F5-TTS — 14,300+ GitHub stars

相关资产