Configs2026年4月7日·1 分钟阅读

Moshi — Real-Time AI Voice Conversation Engine

Open-source real-time voice AI by Kyutai. Full-duplex speech conversation with 200ms latency, emotion recognition, and on-device processing. Apache 2.0 licensed.

What is Moshi?

Moshi is Kyutai's open-source real-time voice AI engine, supporting full-duplex conversation with 200ms latency, emotion recognition, and local execution.

In one sentence: Open-source real-time voice AI — full-duplex conversation at 200ms latency, supports interruption and emotion recognition, runs locally — 8k+ GitHub stars.

For: Developers building voice-first AI applications.

Core Features

1. Full-Duplex Conversation

Supports interruption and overlapping speech — like natural conversation.

2. 200ms Latency

Ultra-low end-to-end latency — no cloud needed.

3. Emotion and Tone

Understands and generates whispers, laughter, hesitation, and more.

4. Local Deployment

Multi-platform support: NVIDIA GPU, Apple MLX, browser.

FAQ

Q: How does it compare to OpenAI's voice mode? A: Moshi is open source and runs locally; OpenAI is cloud-based and closed. Latency is comparable.

Q: Does it support Chinese? A: Currently English-first; multi-language is in development.

🙏

来源与感谢

kyutai-labs/moshi — 8k+ stars, Apache 2.0

讨论

登录后参与讨论。
还没有评论,来写第一条吧。

相关资产