# SenseVoice — Multilingual Speech Understanding Model > SenseVoice is an open-source speech foundation model by Alibaba's FunAudioLLM team that performs automatic speech recognition, language identification, speech emotion recognition, and audio event detection in a single model. It supports 50+ languages and runs significantly faster than Whisper. ## Install Save in your project root: # SenseVoice — Multilingual Speech Understanding Model ## Quick Use ```bash pip install funasr python3 -c " from funasr import AutoModel model = AutoModel(model='iic/SenseVoiceSmall') result = model.generate(input='audio.wav', language='auto') print(result) " ``` ## Introduction SenseVoice goes beyond speech-to-text by combining ASR with speech emotion recognition, spoken language identification, and audio event detection in a single forward pass. Trained on over 400,000 hours of data, it achieves high accuracy across 50+ languages with inference speeds far exceeding Whisper. ## What SenseVoice Does - Transcribes speech in 50+ languages with high accuracy - Detects the spoken language automatically from audio input - Recognizes speaker emotions (happy, sad, angry, neutral, etc.) from voice - Identifies non-speech audio events like applause, laughter, music, and crying - Provides all four capabilities simultaneously in a single inference call ## Architecture Overview SenseVoice uses an encoder-only Transformer architecture with multi-task prediction heads. The shared audio encoder processes mel-spectrogram features through a stack of Conformer blocks. Task-specific output heads branch from the shared representation to produce ASR tokens, language labels, emotion labels, and audio event labels. The SenseVoice-Small variant has a parameter count comparable to Whisper-Small but achieves significantly lower latency through non-autoregressive decoding. ## Self-Hosting & Configuration - Install via FunASR: pip install funasr (Python 3.8+) - Models download automatically from ModelScope or Hugging Face on first use - Available in two sizes: SenseVoice-Small (fast, lightweight) and SenseVoice-Large (higher accuracy) - Set language='auto' for automatic language detection or specify a language code - Deploy in production using FunASR's gRPC/WebSocket server for concurrent requests ## Key Features - Unified model handles ASR, language ID, emotion, and audio events without separate pipelines - Inference speed is 5x faster than Whisper-Small and 15x faster than Whisper-Large - Supports rich transcription with emotion and event tags embedded in output - Works well on noisy audio and multi-speaker scenarios - Fine-tunable on domain-specific data using FunASR training scripts ## Comparison with Similar Tools - **Whisper (OpenAI)** — strong multilingual ASR but autoregressive and slower; SenseVoice adds emotion and event detection - **Faster Whisper** — accelerated Whisper inference; SenseVoice is natively faster due to non-autoregressive architecture - **FunASR Paraformer** — non-autoregressive ASR; SenseVoice adds multi-task understanding beyond transcription - **wav2vec 2.0** — self-supervised speech representation; SenseVoice is a complete end-to-end recognition system - **WhisperX** — adds word-level timestamps to Whisper; SenseVoice provides emotion and event detection instead ## FAQ **Q: How does SenseVoice compare to Whisper in accuracy?** A: SenseVoice matches or exceeds Whisper on standard benchmarks for supported languages, while running significantly faster. **Q: Can I use SenseVoice for real-time applications?** A: Yes. SenseVoice-Small is fast enough for real-time transcription, and FunASR's server supports streaming WebSocket connections. **Q: What format does the emotion output take?** A: Emotion labels are returned as tags (e.g., , ) alongside the transcription text. **Q: Is commercial use permitted?** A: SenseVoice models are released under permissive licenses. Check the specific model card on ModelScope or Hugging Face for license details. ## Sources - https://github.com/FunAudioLLM/SenseVoice - https://fun-audio-llm.github.io/ --- Source: https://tokrepo.com/en/workflows/asset-fe36c7c0 Author: AI Open Source