# FunASR — End-to-End Speech Recognition Toolkit

> FunASR is an open-source speech recognition toolkit by Alibaba DAMO Academy supporting ASR, voice activity detection, punctuation restoration, and text normalization. It ships pretrained models for 50+ languages and provides production-ready server deployment with streaming support.

## Install

Save in your project root:

# FunASR — End-to-End Speech Recognition Toolkit

## Quick Use
```bash
pip install funasr
python3 -c "
from funasr import AutoModel
model = AutoModel(model='iic/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch')
result = model.generate(input='audio.wav')
print(result)
"
```

## Introduction
FunASR provides a complete pipeline for automatic speech recognition, from audio input to formatted text output. It bundles state-of-the-art pretrained models (Paraformer, SenseVoice, Whisper-compatible) with convenient Python APIs and a deployable gRPC/WebSocket server.

## What FunASR Does
- Performs speech-to-text transcription for 50+ languages with pretrained models
- Detects voice activity to segment audio into speech and silence regions
- Restores punctuation and performs inverse text normalization on transcriptions
- Supports both offline (batch) and online (streaming) recognition modes
- Provides a runtime server for production deployment with GPU acceleration

## Architecture Overview
FunASR's core is built on PyTorch and wraps multiple ASR architectures (Paraformer, Conformer, Transformer, Whisper) behind a unified AutoModel interface. The Paraformer model uses a non-autoregressive architecture with a predictor module that estimates token count, enabling single-pass parallel decoding. The runtime server is a C++ gRPC service that loads ONNX-exported models with ONNX Runtime for low-latency inference, accepting WebSocket connections for streaming audio.

## Self-Hosting & Configuration
- Install via pip: pip install funasr (Python 3.8+)
- Models download automatically from ModelScope or Hugging Face on first use
- Deploy the production server using the Docker image: funasr-runtime-sdk-gpu
- Configure the server via command-line flags for model paths, ports, and thread count
- Stream audio to the server over WebSocket for real-time transcription

## Key Features
- Paraformer achieves fast non-autoregressive decoding with high accuracy on Chinese and English
- Streaming mode delivers partial results with low latency for live captioning
- Supports hotword boosting to improve recognition of domain-specific terms
- Includes speaker diarization to distinguish who is speaking
- Production C++ runtime with ONNX optimization for enterprise deployment

## Comparison with Similar Tools
- **Whisper (OpenAI)** — strong multilingual ASR; FunASR offers faster non-autoregressive models and a production server
- **whisper.cpp** — C++ Whisper inference; FunASR provides a broader toolkit with VAD, punctuation, and diarization
- **Faster Whisper** — CTranslate2-based speedup; FunASR's Paraformer is natively non-autoregressive for even lower latency
- **Vosk** — offline speech recognition; FunASR supports both streaming and batch with a wider model zoo
- **DeepSpeech** — Mozilla's end-to-end ASR (archived); FunASR is actively maintained with newer architectures

## FAQ
**Q: Which languages are supported?**
A: FunASR ships models covering 50+ languages, with particular strength in Chinese (including 7 dialects and 26 accents), English, Japanese, and Korean.

**Q: Can I fine-tune models on my own data?**
A: Yes. FunASR provides training scripts and recipes for fine-tuning any supported model on custom datasets.

**Q: What is the recommended deployment for production?**
A: Use the Docker-based runtime server with GPU support. It handles concurrent WebSocket connections and delivers optimized throughput via ONNX Runtime.

**Q: How does Paraformer compare to Whisper in speed?**
A: Paraformer's non-autoregressive decoding is significantly faster than Whisper's autoregressive approach, especially on long audio segments.

## Sources
- https://github.com/modelscope/FunASR
- https://www.funasr.com/

---
Source: https://tokrepo.com/en/workflows/asset-9e95d508
Author: AI Open Source