Cette page est affichée en anglais. Une traduction française est en cours.

SkillsApr 7, 2026·2 min de lecture

Moshi — Real-Time AI Voice Conversation Engine

Open-source real-time voice AI by Kyutai. Full-duplex speech conversation with 200ms latency, emotion recognition, and on-device processing. Apache 2.0 licensed.

AI Open Source · Community

Prêt pour agents

Installation agent prête

Cet actif peut être installé après choix du runtime, vérification du plan et exécution de la commande adaptée.

Native · 98/100Policy : autoriser

Surface agent

Tout agent MCP/CLI

Type

Skill

Installation

Single

Confiance

Confiance : Established

Point d'entrée

Moshi — Real-Time AI Voice Conversation Engine

Commande d'installation directe

npx -y tokrepo@latest install 6172db11-6b8c-431b-8f66-f4b7af585534 --target codex

À exécuter après confirmation du plan en dry-run.

TL;DR

Moshi delivers full-duplex voice conversation with 200ms latency, emotion recognition, and on-device processing under Apache 2.0.

§01

What it is

Moshi is an open-source real-time voice AI system built by Kyutai. It supports full-duplex speech conversation, meaning both parties can speak and listen simultaneously, with approximately 200ms latency. It includes emotion recognition and runs on-device without requiring cloud APIs.

Moshi targets developers building voice interfaces, conversational assistants, and accessibility tools who need low-latency, privacy-preserving voice interaction.

§02

How it saves time or tokens

Moshi processes speech on-device, eliminating round-trip latency to cloud speech APIs. The full-duplex architecture removes turn-taking overhead typical of voice assistants. Estimated token usage for this workflow is around 3,900 tokens.

§03

How to use

Install and start the server:

pip install moshi
python -m moshi.server

Open http://localhost:8998 in your browser.
Start speaking. Moshi responds in real time with full-duplex audio.

§04

Example

# Install Moshi
pip install moshi

# Start the voice server
python -m moshi.server

# Open browser to http://localhost:8998
# Speak naturally — Moshi responds with ~200ms latency

The browser interface handles audio capture and playback. No additional microphone setup is needed beyond browser permissions.

§05

Related on TokRepo

AI Tools for Voice — More voice AI tools and engines
Local LLM Providers — Run AI models locally alongside Moshi

Key considerations

When evaluating Moshi for your workflow, consider the following factors. First, assess whether your team has the technical prerequisites to adopt this tool effectively. Second, evaluate the maintenance burden against the productivity gains. Third, check community activity and documentation quality to ensure long-term viability. Integration with your existing toolchain matters more than feature count alone. Start with a small pilot project before rolling out across the organization. Monitor resource usage during the initial adoption phase to identify bottlenecks early. Document your configuration decisions so team members can onboard independently.

§06

Common pitfalls

On-device processing requires adequate GPU or CPU resources; low-end machines may experience higher latency.
Browser audio permissions must be granted for the microphone to work.
Emotion recognition accuracy varies by language and accent; primary testing has been on English speech.

Questions fréquentes

What does full-duplex mean in this context?+

Full-duplex means both the user and Moshi can speak at the same time without interrupting each other. Traditional voice assistants use half-duplex where you must wait for the assistant to finish before speaking.

Does Moshi require a GPU?+

A GPU accelerates inference and helps maintain the 200ms latency target. CPU-only inference works but with higher latency. For production use, a CUDA-capable GPU is recommended.

Can Moshi run completely offline?+

Yes. Moshi processes speech on-device and does not require internet connectivity after installation. This makes it suitable for privacy-sensitive deployments.

What languages does Moshi support?+

Moshi's primary language support is English. Additional language support depends on the training data and model weights available. Check the repository for the latest supported languages.

Is Moshi suitable for production applications?+

Moshi is released under Apache 2.0, so it can be used commercially. Production readiness depends on your latency and accuracy requirements. Test with your specific use case before deploying.

Sources citées (3)

Moshi GitHub— Full-duplex speech with 200ms latency
Moshi README— Apache 2.0 licensed open-source voice AI
Kyutai Official Site— On-device processing without cloud APIs

En lien sur TokRepo

Voice AI tools Ollama local LLM Featured workflows

🙏

Source et remerciements

Created by Kyutai. Licensed under Apache 2.0.

kyutai-labs/moshi — 8k+ stars

Fil de discussion

Connectez-vous pour rejoindre la discussion.

Aucun commentaire pour l'instant. Soyez le premier à partager votre avis.

Actifs similaires

ElectricSQL — Real-Time Sync Engine for Postgres

ElectricSQL is an open-source sync engine that streams partial replication from PostgreSQL to local clients, enabling local-first applications with instant reads, offline support, and real-time multi-user sync.

Skills

AI Open Source

Meteor — Full-Stack JavaScript Platform for Real-Time Web Apps

Meteor is an open-source full-stack JavaScript platform for building web and mobile applications with real-time data synchronization out of the box.

Skills

AI Open Source

Apache Flink — Stream Processing Framework for Real-Time Data

Apache Flink is the leading open-source framework for stateful stream processing. It processes unbounded data streams with exactly-once semantics, low latency, and high throughput — powering real-time analytics, fraud detection, and event-driven applications.

Skills

Apache Software Foundation

RethinkDB — The Real-Time Document Database

RethinkDB is an open-source document database that pushes query results to your application in real time. Build live dashboards and collaborative apps without polling.

Skills

AI Open Source