Kokoro — Lightweight 82M TTS in 9 Languages
Kokoro is an 82M parameter text-to-speech model delivering quality comparable to larger models. 6.2K+ GitHub stars. Supports English, Spanish, French, Japanese, Chinese, and more. Apache 2.0.
What it is
Kokoro is a lightweight text-to-speech model with 82 million parameters. Despite its small size, it produces speech quality comparable to models with billions of parameters. It supports 9 languages including English, Spanish, French, Japanese, Chinese, Korean, Hindi, Italian, and Portuguese.
Kokoro is designed for developers building voice-enabled applications who need fast, local TTS without relying on cloud APIs. Its small footprint makes it suitable for edge deployment, CI/CD pipelines that generate audio, and applications where latency matters.
How it saves time or tokens
Cloud TTS APIs charge per character and introduce network latency. Kokoro runs locally on CPU or GPU, eliminating both the cost and the round-trip delay. A single pip install gets you from zero to generating speech in under a minute. The 82M parameter count means the model loads fast and runs on machines without dedicated GPU hardware.
For AI agent pipelines that need voice output, Kokoro avoids the token cost of sending text to a cloud TTS API and waiting for audio bytes to stream back.
How to use
- Install Kokoro via pip:
pip install kokoro
- Generate speech with a few lines of Python:
from kokoro import KPipeline
pipe = KPipeline(lang_code='a') # 'a' = American English
generator = pipe('Hello, this is Kokoro text to speech.', voice='af_heart')
for i, (gs, ps, audio) in enumerate(generator):
# audio is a numpy array at 24kHz
pass
- Save the output as a WAV file or stream it to your application.
Example
from kokoro import KPipeline
import soundfile as sf
pipe = KPipeline(lang_code='a')
text = 'Kokoro runs locally with no API key required.'
generator = pipe(text, voice='af_heart', speed=1.0)
for i, (gs, ps, audio) in enumerate(generator):
sf.write(f'output_{i}.wav', audio, 24000)
print(f'Saved segment {i}: {gs}')
This script generates WAV files at 24kHz sample rate. The voice parameter selects from available voice presets, and speed controls playback rate.
Related on TokRepo
- AI voice tools -- Explore other text-to-speech and voice synthesis tools
- Local LLM runners -- Run AI models privately on your own hardware
Common pitfalls
- Language codes are single letters (e.g., 'a' for American English, 'j' for Japanese). Using full locale strings like 'en-US' will raise an error. Check the documentation for the correct single-letter codes.
- Audio output is raw numpy arrays at 24kHz. You need soundfile or scipy to save them as WAV. Forgetting to specify the sample rate when saving produces garbled audio.
- Kokoro downloads model weights on first use. The initial run takes longer due to the download. Subsequent runs load from cache.
Frequently Asked Questions
Kokoro supports 9 languages: American English, British English, Spanish, French, Japanese, Chinese (Mandarin), Korean, Hindi, Italian, and Portuguese (Brazilian). Each language has its own language code and set of available voices.
Yes. Kokoro's 82M parameter size is small enough to run efficiently on CPU. GPU acceleration is supported but not required. CPU inference is fast enough for real-time speech generation in most applications.
Kokoro produces natural-sounding speech that reviewers have compared favorably to cloud services like Google Cloud TTS and Amazon Polly. The quality is particularly strong for English and Japanese. Some voices sound more natural than others, so testing different voice presets is recommended.
Yes. Kokoro is released under the Apache 2.0 license, which permits commercial use, modification, and distribution. There are no per-character or per-request fees since the model runs locally.
Kokoro ships with a set of pre-trained voice presets. Adding custom voices requires fine-tuning the model on your own voice data. The project provides documentation on voice cloning workflows, though this requires additional training compute and audio samples.
Citations (3)
- Kokoro GitHub— 82M parameter TTS model supporting 9 languages
- Kokoro GitHub License— Apache 2.0 license
- Kokoro Hugging Face— Kokoro achieves quality comparable to larger models
Related on TokRepo
Source & Thanks
Created by Hexgrad. Licensed under Apache 2.0. hexgrad/kokoro — 6,200+ GitHub stars
Discussion
Related Assets
NAPI-RS — Build Node.js Native Addons in Rust
Write high-performance Node.js native modules in Rust with automatic TypeScript type generation and cross-platform prebuilt binaries.
Mamba — Fast Cross-Platform Package Manager
A drop-in conda replacement written in C++ that resolves environments in seconds instead of minutes.
Plasmo — The Browser Extension Framework
Build, test, and publish browser extensions for Chrome, Firefox, and Edge using React or Vue with hot-reload and automatic manifest generation.