Skills2026年3月31日·1 分钟阅读

Kokoro — Lightweight 82M TTS in 9 Languages

Kokoro is an 82M parameter text-to-speech model delivering quality comparable to larger models. 6.2K+ GitHub stars. Supports English, Spanish, French, Japanese, Chinese, and more. Apache 2.0.

Script Depot · Community

Agent 就绪

Agent 可直接安装

这个资产可安装；Agent 先选择当前运行时、检查安装计划，再运行匹配命令。

Native · 98/100策略：允许

Agent 入口

任意 MCP/CLI Agent

类型

Skill

安装

Single

信任

信任等级：Established

入口

Kokoro — Lightweight 82M TTS in 9 Languages

直接安装命令

npx -y tokrepo@latest install 44809dfb-1735-4aae-af74-f21f4b805d0f --target codex

先 dry-run 确认安装计划，再运行此命令。

TL;DR

Kokoro delivers high-quality text-to-speech across 9 languages with only 82M parameters and Apache 2.0 license.

§01

What it is

Kokoro is a lightweight text-to-speech model with 82 million parameters. Despite its small size, it produces speech quality comparable to models with billions of parameters. It supports 9 languages including English, Spanish, French, Japanese, Chinese, Korean, Hindi, Italian, and Portuguese.

Kokoro is designed for developers building voice-enabled applications who need fast, local TTS without relying on cloud APIs. Its small footprint makes it suitable for edge deployment, CI/CD pipelines that generate audio, and applications where latency matters.

§02

How it saves time or tokens

Cloud TTS APIs charge per character and introduce network latency. Kokoro runs locally on CPU or GPU, eliminating both the cost and the round-trip delay. A single pip install gets you from zero to generating speech in under a minute. The 82M parameter count means the model loads fast and runs on machines without dedicated GPU hardware.

For AI agent pipelines that need voice output, Kokoro avoids the token cost of sending text to a cloud TTS API and waiting for audio bytes to stream back.

§03

How to use

Install Kokoro via pip:

pip install kokoro

Generate speech with a few lines of Python:

from kokoro import KPipeline

pipe = KPipeline(lang_code='a')  # 'a' = American English
generator = pipe('Hello, this is Kokoro text to speech.', voice='af_heart')
for i, (gs, ps, audio) in enumerate(generator):
    # audio is a numpy array at 24kHz
    pass

Save the output as a WAV file or stream it to your application.

§04

Example

from kokoro import KPipeline
import soundfile as sf

pipe = KPipeline(lang_code='a')

text = 'Kokoro runs locally with no API key required.'
generator = pipe(text, voice='af_heart', speed=1.0)

for i, (gs, ps, audio) in enumerate(generator):
    sf.write(f'output_{i}.wav', audio, 24000)
    print(f'Saved segment {i}: {gs}')

This script generates WAV files at 24kHz sample rate. The voice parameter selects from available voice presets, and speed controls playback rate.

§05

Related on TokRepo

AI voice tools -- Explore other text-to-speech and voice synthesis tools
Local LLM runners -- Run AI models privately on your own hardware

§06

Common pitfalls

Language codes are single letters (e.g., 'a' for American English, 'j' for Japanese). Using full locale strings like 'en-US' will raise an error. Check the documentation for the correct single-letter codes.
Audio output is raw numpy arrays at 24kHz. You need soundfile or scipy to save them as WAV. Forgetting to specify the sample rate when saving produces garbled audio.
Kokoro downloads model weights on first use. The initial run takes longer due to the download. Subsequent runs load from cache.

常见问题

What languages does Kokoro support?+

Kokoro supports 9 languages: American English, British English, Spanish, French, Japanese, Chinese (Mandarin), Korean, Hindi, Italian, and Portuguese (Brazilian). Each language has its own language code and set of available voices.

Can Kokoro run on CPU only?+

Yes. Kokoro's 82M parameter size is small enough to run efficiently on CPU. GPU acceleration is supported but not required. CPU inference is fast enough for real-time speech generation in most applications.

What is the audio quality compared to cloud TTS services?+

Kokoro produces natural-sounding speech that reviewers have compared favorably to cloud services like Google Cloud TTS and Amazon Polly. The quality is particularly strong for English and Japanese. Some voices sound more natural than others, so testing different voice presets is recommended.

Is Kokoro free for commercial use?+

Yes. Kokoro is released under the Apache 2.0 license, which permits commercial use, modification, and distribution. There are no per-character or per-request fees since the model runs locally.

How do I add custom voices to Kokoro?+

Kokoro ships with a set of pre-trained voice presets. Adding custom voices requires fine-tuning the model on your own voice data. The project provides documentation on voice cloning workflows, though this requires additional training compute and audio samples.

引用来源 (3)

Kokoro GitHub— 82M parameter TTS model supporting 9 languages
Kokoro GitHub License— Apache 2.0 license
Kokoro Hugging Face— Kokoro achieves quality comparable to larger models

🙏

来源与感谢

Created by Hexgrad. Licensed under Apache 2.0. hexgrad/kokoro — 6,200+ GitHub stars

讨论

登录后参与讨论。

还没有评论，来写第一条吧。

Kokoro — Lightweight 82M TTS in 9 Languages

Agent 可直接安装

What it is

How it saves time or tokens

How to use

Example

Related on TokRepo

Common pitfalls

常见问题

引用来源 (3)

TokRepo 相关

来源与感谢

讨论

相关资产

Resilience4j — Lightweight Fault Tolerance Library for Java

Sonic — Fast Lightweight Search Backend in Rust

Javalin — Simple Lightweight Web Framework for Java and Kotlin

Feathers — Lightweight Real-Time API Framework for Node.js