# AssemblyAI Universal-2 — Streaming STT for Voice Agents > AssemblyAI Universal-2 is production STT with <500ms streaming latency, 99 languages, diarization, smart formatting. OpenAI-compat audio. ## Install Copy the content below into your project: ## Quick Use 1. `pip install assemblyai` 2. `aai.settings.api_key = ASSEMBLYAI_KEY` 3. `aai.Transcriber().transcribe(file_or_url)` for batch, `RealtimeTranscriber` for streaming --- ## Intro Universal-2 is AssemblyAI's latest production STT model — sub-500ms streaming latency, 99 languages, automatic speaker diarization, smart formatting (currency, dates, addresses, profanity filter), and an OpenAI-compatible `audio.transcriptions` endpoint for drop-in migration. Best for: voice agents on calls, meeting transcription, accessibility captions, multilingual support flows. Works with: Python, Node, Go SDKs; REST; streaming WebSocket; OpenAI-compatible API. Setup time: 5 minutes. --- ### Batch transcription (file) ```python import assemblyai as aai aai.settings.api_key = os.environ["ASSEMBLYAI_API_KEY"] transcriber = aai.Transcriber() transcript = transcriber.transcribe( "meeting.mp3", config=aai.TranscriptionConfig( speaker_labels=True, language_detection=True, punctuate=True, format_text=True, speech_model=aai.SpeechModel.universal, # Universal-2 ), ) for u in transcript.utterances: print(f"Speaker {u.speaker}: {u.text}") ``` ### Real-time streaming (WebSocket) ```python import assemblyai as aai def on_data(transcript: aai.RealtimeTranscript): if isinstance(transcript, aai.RealtimeFinalTranscript): print(f"FINAL: {transcript.text}") else: print(f"partial: {transcript.text}") transcriber = aai.RealtimeTranscriber( sample_rate=16_000, on_data=on_data, on_error=lambda e: print(f"err: {e}"), ) transcriber.connect() transcriber.stream(mic_audio_iterator()) # bytes iterator transcriber.close() ``` ### OpenAI-compatible (zero-code migration) ```python from openai import OpenAI client = OpenAI( base_url="https://api.assemblyai.com/v1", api_key=os.environ["ASSEMBLYAI_API_KEY"], ) transcript = client.audio.transcriptions.create( model="universal-2", file=open("audio.mp3", "rb"), response_format="verbose_json", ) print(transcript.text) ``` ### Feature flags worth knowing | Flag | What it does | |---|---| | `speaker_labels` | Diarize 2-10 speakers automatically | | `auto_chapters` | Generate chapter summaries every ~5 min | | `entity_detection` | Tag PII (person, org, location, card, phone) | | `pii_redaction` | Replace detected PII with `[REDACTED]` | | `sentiment_analysis` | Per-sentence sentiment scores | | `summarization` | Auto-generate transcript summary | | `language_detection` | Detect spoken language, no need to pre-specify | --- ### FAQ **Q: Universal-2 vs Whisper-large-v3?** A: Universal-2 has better diarization, smart formatting, and per-language tuning — best for production English/Spanish calls. Whisper-large-v3 has broader low-resource language coverage and is open-weight. For voice agents and call centers, Universal-2 typically wins on word error rate and formatting. **Q: How accurate is the speaker diarization?** A: On clean two-speaker call audio, ~95% accuracy. Drops to ~85-90% with 4+ speakers, overlapping speech, or heavy background noise. For high-stakes diarization (legal transcripts) review human-in-the-loop on cluster boundaries. **Q: Pricing?** A: Streaming: $0.47/hr. Batch async: $0.37/hr (Universal-2 default). Plus add-ons per feature (speaker labels +$0.13/hr, summarization +$0.13/hr, etc). Free $50 trial credit. See assemblyai.com/pricing. --- ## Source & Thanks > Built by [AssemblyAI](https://github.com/AssemblyAI). API docs at [assemblyai.com/docs](https://assemblyai.com/docs). > > [AssemblyAI/assemblyai-python-sdk](https://github.com/AssemblyAI/assemblyai-python-sdk) — official SDK --- ## 快速使用 1. `pip install assemblyai` 2. `aai.settings.api_key = ASSEMBLYAI_KEY` 3. 批量用 `aai.Transcriber().transcribe(file_or_url)`,流式用 `RealtimeTranscriber` --- ## 简介 Universal-2 是 AssemblyAI 最新生产 STT 模型 —— 流式延迟 <500ms、99 种语言、自动说话人分离、智能格式化(货币 / 日期 / 地址 / 脏话过滤)、OpenAI 兼容 `audio.transcriptions` endpoint 方便迁移。适合通话语音 agent、会议转录、无障碍字幕、多语言客服流程。兼容 Python / Node / Go SDK、REST、流式 WebSocket、OpenAI 兼容 API。装机时间 5 分钟。 --- ### 批量转录(文件) ```python import assemblyai as aai aai.settings.api_key = os.environ["ASSEMBLYAI_API_KEY"] transcriber = aai.Transcriber() transcript = transcriber.transcribe( "meeting.mp3", config=aai.TranscriptionConfig( speaker_labels=True, language_detection=True, punctuate=True, format_text=True, speech_model=aai.SpeechModel.universal, # Universal-2 ), ) for u in transcript.utterances: print(f"说话人 {u.speaker}:{u.text}") ``` ### 实时流式(WebSocket) ```python import assemblyai as aai def on_data(transcript: aai.RealtimeTranscript): if isinstance(transcript, aai.RealtimeFinalTranscript): print(f"最终:{transcript.text}") else: print(f"部分:{transcript.text}") transcriber = aai.RealtimeTranscriber( sample_rate=16_000, on_data=on_data, on_error=lambda e: print(f"错:{e}"), ) transcriber.connect() transcriber.stream(mic_audio_iterator()) # 字节迭代器 transcriber.close() ``` ### OpenAI 兼容(零代码迁移) ```python from openai import OpenAI client = OpenAI( base_url="https://api.assemblyai.com/v1", api_key=os.environ["ASSEMBLYAI_API_KEY"], ) transcript = client.audio.transcriptions.create( model="universal-2", file=open("audio.mp3", "rb"), response_format="verbose_json", ) print(transcript.text) ``` ### 值得知道的功能开关 | 开关 | 作用 | |---|---| | `speaker_labels` | 自动分离 2-10 个说话人 | | `auto_chapters` | 约每 5 分钟生成章节摘要 | | `entity_detection` | 标记 PII(人 / 组织 / 地点 / 卡号 / 电话)| | `pii_redaction` | 把检测到的 PII 替换为 `[REDACTED]` | | `sentiment_analysis` | 每句情感分数 | | `summarization` | 自动生成转录摘要 | | `language_detection` | 检测口语语言,不用预指定 | --- ### FAQ **Q: Universal-2 vs Whisper-large-v3?** A: Universal-2 在分离、智能格式化、单语调优上更强 —— 生产英语/西班牙语电话最佳。Whisper-large-v3 低资源语言覆盖更广,权重开源。语音 agent 和呼叫中心 Universal-2 在词错误率和格式化通常赢。 **Q: 说话人分离多准?** A: 干净双人通话音频约 95%。4+ 人、重叠说话、强背景噪声下降到 ~85-90%。高风险分离(法律转录)在聚类边界要人工审核。 **Q: 价格?** A: 流式 $0.47/小时。批量异步 $0.37/小时(Universal-2 默认)。加每个功能附加(speaker labels +$0.13/小时、summarization +$0.13/小时 等)。免费 $50 试用 credit。看 assemblyai.com/pricing。 --- ## 来源与感谢 > Built by [AssemblyAI](https://github.com/AssemblyAI). API docs at [assemblyai.com/docs](https://assemblyai.com/docs). > > [AssemblyAI/assemblyai-python-sdk](https://github.com/AssemblyAI/assemblyai-python-sdk) — official SDK --- Source: https://tokrepo.com/en/workflows/assemblyai-universal-2-streaming-stt-for-voice-agents Author: AssemblyAI