# Moshi — Real-Time AI Voice Conversation Engine > Open-source real-time voice AI by Kyutai. Full-duplex speech conversation with 200ms latency, emotion recognition, and on-device processing. Apache 2.0 licensed. ## Install Save the content below to `.claude/skills/` or append to your `CLAUDE.md`: ## Quick Use ```bash pip install moshi python -m moshi.server ``` Open `http://localhost:8998` — start talking to Moshi in real-time. ## What is Moshi? Moshi is an open-source real-time voice AI engine by Kyutai. It enables full-duplex speech conversations with ~200ms latency — meaning you can interrupt, overlap, and have natural back-and-forth dialog with an AI. It runs on-device with no cloud dependency. **Answer-Ready**: Moshi is an open-source real-time voice AI engine by Kyutai with full-duplex speech conversation at 200ms latency. Supports interruptions, emotion recognition, and on-device processing. Apache 2.0 licensed with 8k+ GitHub stars. **Best for**: Developers building voice-first AI applications. **Works with**: Local GPU (NVIDIA), Apple MLX, web browser. **Setup time**: Under 5 minutes. ## Core Features ### 1. Full-Duplex Conversation Unlike turn-based voice assistants, Moshi handles overlapping speech: - You can interrupt mid-sentence - Moshi responds while you're still talking - Natural conversation flow like a human call ### 2. Ultra-Low Latency End-to-end latency breakdown: ``` Speech recognition: ~50ms Language model: ~100ms Speech synthesis: ~50ms Total: ~200ms ``` ### 3. Architecture Joint speech-text model — no separate ASR + LLM + TTS pipeline: ``` Audio input → Mimi Encoder → Helium LM → Mimi Decoder → Audio output ↕ Text reasoning ``` - **Mimi**: Neural audio codec (12.5 Hz, 1.1 kbps) - **Helium**: 7B parameter multimodal language model ### 4. Emotion & Tone Moshi understands and generates: - Whispers, laughter, hesitation - Emotional tone (excited, calm, serious) - Multiple speaking styles ### 5. Deployment Options | Platform | How | |----------|-----| | Python server | `python -m moshi.server` | | Rust server | High-performance production deployment | | Web client | Browser-based demo | | MLX | Apple Silicon optimized | ## Hardware Requirements | GPU | Model Size | Latency | |-----|-----------|---------| | NVIDIA A100 | 7B | ~160ms | | NVIDIA RTX 4090 | 7B | ~200ms | | Apple M2 Ultra | 7B (MLX) | ~300ms | ## FAQ **Q: How does it compare to OpenAI's voice mode?** A: Moshi is open-source and runs locally. OpenAI's voice mode is cloud-only and proprietary. Moshi has comparable latency. **Q: Can I fine-tune it?** A: Yes, both the Mimi codec and Helium LM can be fine-tuned for custom voice personas and domains. **Q: Does it support multiple languages?** A: Currently optimized for English. Multilingual support is in development. ## Source & Thanks > Created by [Kyutai](https://github.com/kyutai-labs). Licensed under Apache 2.0. > > [kyutai-labs/moshi](https://github.com/kyutai-labs/moshi) — 8k+ stars ## Quick Start ```bash pip install moshi python -m moshi.server ``` Open `localhost:8998` in your browser to start real-time voice conversations. ## What is Moshi? Moshi is Kyutai's open-source real-time voice AI engine, supporting full-duplex conversation with 200ms latency, emotion recognition, and local execution. **In one sentence**: Open-source real-time voice AI — full-duplex conversation at 200ms latency, supports interruption and emotion recognition, runs locally — 8k+ GitHub stars. **For**: Developers building voice-first AI applications. ## Core Features ### 1. Full-Duplex Conversation Supports interruption and overlapping speech — like natural conversation. ### 2. 200ms Latency Ultra-low end-to-end latency — no cloud needed. ### 3. Emotion and Tone Understands and generates whispers, laughter, hesitation, and more. ### 4. Local Deployment Multi-platform support: NVIDIA GPU, Apple MLX, browser. ## FAQ **Q: How does it compare to OpenAI's voice mode?** A: Moshi is open source and runs locally; OpenAI is cloud-based and closed. Latency is comparable. **Q: Does it support Chinese?** A: Currently English-first; multi-language is in development. ## Source & Thanks > [kyutai-labs/moshi](https://github.com/kyutai-labs/moshi) — 8k+ stars, Apache 2.0 --- Source: https://tokrepo.com/en/workflows/moshi-real-time-ai-voice-conversation-engine-6172db11 Author: AI Open Source