ScriptsMar 31, 2026·2 min read

Kokoro — Lightweight 82M TTS in 9 Languages

Kokoro is an 82M parameter text-to-speech model delivering quality comparable to larger models. 6.2K+ GitHub stars. Supports English, Spanish, French, Japanese, Chinese, and more. Apache 2.0.

TL;DR
Kokoro delivers high-quality text-to-speech across 9 languages with only 82M parameters and Apache 2.0 license.
§01

What it is

Kokoro is a lightweight text-to-speech model with 82 million parameters. Despite its small size, it produces speech quality comparable to models with billions of parameters. It supports 9 languages including English, Spanish, French, Japanese, Chinese, Korean, Hindi, Italian, and Portuguese.

Kokoro is designed for developers building voice-enabled applications who need fast, local TTS without relying on cloud APIs. Its small footprint makes it suitable for edge deployment, CI/CD pipelines that generate audio, and applications where latency matters.

§02

How it saves time or tokens

Cloud TTS APIs charge per character and introduce network latency. Kokoro runs locally on CPU or GPU, eliminating both the cost and the round-trip delay. A single pip install gets you from zero to generating speech in under a minute. The 82M parameter count means the model loads fast and runs on machines without dedicated GPU hardware.

For AI agent pipelines that need voice output, Kokoro avoids the token cost of sending text to a cloud TTS API and waiting for audio bytes to stream back.

§03

How to use

  1. Install Kokoro via pip:
pip install kokoro
  1. Generate speech with a few lines of Python:
from kokoro import KPipeline

pipe = KPipeline(lang_code='a')  # 'a' = American English
generator = pipe('Hello, this is Kokoro text to speech.', voice='af_heart')
for i, (gs, ps, audio) in enumerate(generator):
    # audio is a numpy array at 24kHz
    pass
  1. Save the output as a WAV file or stream it to your application.
§04

Example

from kokoro import KPipeline
import soundfile as sf

pipe = KPipeline(lang_code='a')

text = 'Kokoro runs locally with no API key required.'
generator = pipe(text, voice='af_heart', speed=1.0)

for i, (gs, ps, audio) in enumerate(generator):
    sf.write(f'output_{i}.wav', audio, 24000)
    print(f'Saved segment {i}: {gs}')

This script generates WAV files at 24kHz sample rate. The voice parameter selects from available voice presets, and speed controls playback rate.

§05

Related on TokRepo

§06

Common pitfalls

  • Language codes are single letters (e.g., 'a' for American English, 'j' for Japanese). Using full locale strings like 'en-US' will raise an error. Check the documentation for the correct single-letter codes.
  • Audio output is raw numpy arrays at 24kHz. You need soundfile or scipy to save them as WAV. Forgetting to specify the sample rate when saving produces garbled audio.
  • Kokoro downloads model weights on first use. The initial run takes longer due to the download. Subsequent runs load from cache.

Frequently Asked Questions

What languages does Kokoro support?+

Kokoro supports 9 languages: American English, British English, Spanish, French, Japanese, Chinese (Mandarin), Korean, Hindi, Italian, and Portuguese (Brazilian). Each language has its own language code and set of available voices.

Can Kokoro run on CPU only?+

Yes. Kokoro's 82M parameter size is small enough to run efficiently on CPU. GPU acceleration is supported but not required. CPU inference is fast enough for real-time speech generation in most applications.

What is the audio quality compared to cloud TTS services?+

Kokoro produces natural-sounding speech that reviewers have compared favorably to cloud services like Google Cloud TTS and Amazon Polly. The quality is particularly strong for English and Japanese. Some voices sound more natural than others, so testing different voice presets is recommended.

Is Kokoro free for commercial use?+

Yes. Kokoro is released under the Apache 2.0 license, which permits commercial use, modification, and distribution. There are no per-character or per-request fees since the model runs locally.

How do I add custom voices to Kokoro?+

Kokoro ships with a set of pre-trained voice presets. Adding custom voices requires fine-tuning the model on your own voice data. The project provides documentation on voice cloning workflows, though this requires additional training compute and audio samples.

Citations (3)
🙏

Source & Thanks

Created by Hexgrad. Licensed under Apache 2.0. hexgrad/kokoro — 6,200+ GitHub stars

Discussion

Sign in to join the discussion.
No comments yet. Be the first to share your thoughts.

Related Assets