CosyVoice is an open-source text-to-speech system built on large language models by Alibaba's FunAudioLLM team. It supports 9 languages and 18+ Chinese dialects with zero-shot voice cloning, streaming synthesis, and fine-grained prosody control.
CosyVoice — Multilingual Voice Generation with LLM-Based TTS
CosyVoice is an open-source text-to-speech system built on large language models by Alibaba's FunAudioLLM team. It supports 9 languages and 18+ Chinese dialects with zero-shot voice cloning, streaming synthesis, and fine-grained prosody control.
This asset can be read and installed directly by agents
TokRepo exposes a universal CLI command, install contract, metadata JSON, adapter-aware plan, and raw content links so agents can judge fit, risk, and next actions.
npx tokrepo install 7141df5f-537e-11f1-9bc6-00163e2b0d79Discussion
Related Assets
Zonos — Multilingual TTS with Voice Cloning
Zonos is an open-weight TTS model trained on 200K+ hours of speech. 7.2K+ stars. Voice cloning, 5 languages, emotion control. Apache 2.0.
Fish Speech — Multilingual TTS for 80+ Languages
Fish Speech is a state-of-the-art open-source TTS system supporting 80+ languages. 29K+ GitHub stars. 4B dual-AR model, voice cloning, emotional control with 15K+ tags, real-time inference.
GPT-SoVITS — Few-Shot Voice Cloning and Text-to-Speech
An open-source TTS system that can clone any voice from just one minute of audio data, combining GPT-style language modeling with VITS synthesis for natural speech generation.
Vapi — Voice AI Agent Platform with STT, LLM & TTS
Vapi glues STT, LLM, TTS, turn-taking into one voice agent API. Build phone agents in minutes. Twilio + Deepgram + ElevenLabs + GPT-4o stack.