Kokoro — Lightweight 82M TTS in 9 Languages
Kokoro is an 82M parameter text-to-speech model delivering quality comparable to larger models. 6.2K+ GitHub stars. Supports English, Spanish, French, Japanese, Chinese, and more. Apache 2.0.
Ready-to-run agent install
This asset can be installed after the agent chooses its runtime, checks the plan, and runs the matching command.
npx -y tokrepo@latest install 44809dfb-1735-4aae-af74-f21f4b805d0f --target codexRun after dry-run confirms the install plan.
What it is
Kokoro is a lightweight text-to-speech model with 82 million parameters. Despite its small size, it produces speech quality comparable to models with billions of parameters. It supports 9 languages including English, Spanish, French, Japanese, Chinese, Korean, Hindi, Italian, and Portuguese.
Kokoro is designed for developers building voice-enabled applications who need fast, local TTS without relying on cloud APIs. Its small footprint makes it suitable for edge deployment, CI/CD pipelines that generate audio, and applications where latency matters.
How it saves time or tokens
Cloud TTS APIs charge per character and introduce network latency. Kokoro runs locally on CPU or GPU, eliminating both the cost and the round-trip delay. A single pip install gets you from zero to generating speech in under a minute. The 82M parameter count means the model loads fast and runs on machines without dedicated GPU hardware.
For AI agent pipelines that need voice output, Kokoro avoids the token cost of sending text to a cloud TTS API and waiting for audio bytes to stream back.
How to use
- Install Kokoro via pip:
pip install kokoro
- Generate speech with a few lines of Python:
from kokoro import KPipeline
pipe = KPipeline(lang_code='a') # 'a' = American English
generator = pipe('Hello, this is Kokoro text to speech.', voice='af_heart')
for i, (gs, ps, audio) in enumerate(generator):
# audio is a numpy array at 24kHz
pass
- Save the output as a WAV file or stream it to your application.
Example
from kokoro import KPipeline
import soundfile as sf
pipe = KPipeline(lang_code='a')
text = 'Kokoro runs locally with no API key required.'
generator = pipe(text, voice='af_heart', speed=1.0)
for i, (gs, ps, audio) in enumerate(generator):
sf.write(f'output_{i}.wav', audio, 24000)
print(f'Saved segment {i}: {gs}')
This script generates WAV files at 24kHz sample rate. The voice parameter selects from available voice presets, and speed controls playback rate.
Related on TokRepo
- AI voice tools -- Explore other text-to-speech and voice synthesis tools
- Local LLM runners -- Run AI models privately on your own hardware
Common pitfalls
- Language codes are single letters (e.g., 'a' for American English, 'j' for Japanese). Using full locale strings like 'en-US' will raise an error. Check the documentation for the correct single-letter codes.
- Audio output is raw numpy arrays at 24kHz. You need soundfile or scipy to save them as WAV. Forgetting to specify the sample rate when saving produces garbled audio.
- Kokoro downloads model weights on first use. The initial run takes longer due to the download. Subsequent runs load from cache.
Frequently Asked Questions
Kokoro supports 9 languages: American English, British English, Spanish, French, Japanese, Chinese (Mandarin), Korean, Hindi, Italian, and Portuguese (Brazilian). Each language has its own language code and set of available voices.
Yes. Kokoro's 82M parameter size is small enough to run efficiently on CPU. GPU acceleration is supported but not required. CPU inference is fast enough for real-time speech generation in most applications.
Kokoro produces natural-sounding speech that reviewers have compared favorably to cloud services like Google Cloud TTS and Amazon Polly. The quality is particularly strong for English and Japanese. Some voices sound more natural than others, so testing different voice presets is recommended.
Yes. Kokoro is released under the Apache 2.0 license, which permits commercial use, modification, and distribution. There are no per-character or per-request fees since the model runs locally.
Kokoro ships with a set of pre-trained voice presets. Adding custom voices requires fine-tuning the model on your own voice data. The project provides documentation on voice cloning workflows, though this requires additional training compute and audio samples.
Citations (3)
- Kokoro GitHub— 82M parameter TTS model supporting 9 languages
- Kokoro GitHub License— Apache 2.0 license
- Kokoro Hugging Face— Kokoro achieves quality comparable to larger models
Related on TokRepo
Source & Thanks
Created by Hexgrad. Licensed under Apache 2.0. hexgrad/kokoro — 6,200+ GitHub stars
Discussion
Related Assets
Resilience4j — Lightweight Fault Tolerance Library for Java
Resilience4j is a lightweight fault tolerance library for Java applications, providing circuit breaker, rate limiter, retry, bulkhead, and time limiter patterns with a functional programming model.
Sonic — Fast Lightweight Search Backend in Rust
Sonic is a schema-less search backend written in Rust that acts as a lightweight alternative to Elasticsearch for text search. It ingests text, builds an inverted index, and responds to search queries in microseconds while using minimal memory.
Javalin — Simple Lightweight Web Framework for Java and Kotlin
Javalin is a lightweight web framework for Java and Kotlin built on top of Jetty, designed for simplicity with a small learning curve and first-class support for both languages.
Feathers — Lightweight Real-Time API Framework for Node.js
Feathers is a lightweight web framework for building real-time applications and REST APIs with Node.js. It provides a service-oriented architecture that works with Express, Koa, or its own HTTP transport.