# Deepgram Nova-3 — Production STT with 60ms Partial Latency > Deepgram Nova-3 streams partials in 60ms, finals <300ms. 36 languages, smart formatting, multilingual single-pass. Default for call centers. ## Install Copy the content below into your project: ## Quick Use 1. `pip install deepgram-sdk` and get DEEPGRAM_API_KEY at console.deepgram.com 2. Streaming: `dg.listen.asyncwebsocket.v('1')` + `LiveOptions(model='nova-3')` 3. Batch: `dg.listen.prerecorded.v('1').transcribe_file({'buffer':...}, PrerecordedOptions(model='nova-3'))` --- ## Intro Nova-3 is Deepgram's latest production STT — 60ms partial result latency, sub-300ms final results, 36 languages, automatic punctuation, smart formatting, profanity filter, custom vocabulary. The de facto default for English call-center transcription and voice agents. Best for: phone agents, meeting recorders, live captioning, voice-controlled apps where latency dominates UX. Works with: Deepgram Python/JS/Go/Rust SDKs, REST, WebSocket streaming, OpenAI-compatible audio endpoint. Setup time: 5 minutes. --- ### Streaming STT (Python) ```python import asyncio from deepgram import DeepgramClient, LiveTranscriptionEvents, LiveOptions dg = DeepgramClient(os.environ["DEEPGRAM_API_KEY"]) async def transcribe_mic(): connection = dg.listen.asyncwebsocket.v("1") async def on_message(_, result, **kwargs): sentence = result.channel.alternatives[0].transcript if not sentence: return if result.is_final: print(f"FINAL: {sentence}") else: print(f"interim: {sentence}", end="\r") connection.on(LiveTranscriptionEvents.Transcript, on_message) options = LiveOptions( model="nova-3", language="multi", # or "en", "es", "fr", etc. smart_format=True, interim_results=True, utterance_end_ms="1000", vad_events=True, ) await connection.start(options) # feed audio bytes from mic async for audio_chunk in mic_audio_iterator(): await connection.send(audio_chunk) await connection.finish() asyncio.run(transcribe_mic()) ``` ### Batch transcription (file) ```python from deepgram import PrerecordedOptions with open("call.mp3", "rb") as f: response = dg.listen.prerecorded.v("1").transcribe_file( {"buffer": f.read()}, PrerecordedOptions( model="nova-3", smart_format=True, diarize=True, punctuate=True, paragraphs=True, summarize="v2", detect_topics=True, ), ) print(response.results.channels[0].alternatives[0].transcript) ``` ### OpenAI-compatible endpoint ```python from openai import OpenAI client = OpenAI( base_url="https://api.deepgram.com/v1", api_key=os.environ["DEEPGRAM_API_KEY"], ) transcript = client.audio.transcriptions.create( model="nova-3", file=open("audio.mp3", "rb"), ) ``` ### Latency vs others (p50, streaming partial) | Provider | Partial latency | |---|---| | **Deepgram Nova-3** | **~60ms** | | AssemblyAI Universal-2 | ~150-300ms | | Groq Whisper Turbo | ~200ms | | OpenAI Whisper-1 | ~600ms (batch only) | ### Pricing (May 2026) - Streaming Nova-3: $0.0058/min - Batch Nova-3: $0.0043/min - $200 free credit on signup --- ### FAQ **Q: Deepgram Nova-3 vs Whisper on Groq vs AssemblyAI?** A: Deepgram has the lowest partial latency by 90ms+ — wins for English voice agents and call centers. Whisper-on-Groq has broader low-resource language coverage. AssemblyAI has better diarization and built-in LeMUR for transcript LLMs. Pick by primary task. **Q: Custom vocabulary for product names?** A: Yes — pass `keywords=['TokRepo', 'GEOScore', 'KeepRule']` in LiveOptions. Deepgram boosts these tokens during decoding so brand names transcribe correctly. Limit ~100 keywords for best results. **Q: Phone call accuracy on 8kHz audio?** A: Excellent — Nova-3 trained heavily on telephony. Set `encoding='mulaw', sample_rate=8000` for Twilio Media Streams. Stereo per-channel (caller/callee on different channels) hits ~99% diarization. --- ## Source & Thanks > Built by [Deepgram](https://github.com/deepgram). API docs at [developers.deepgram.com](https://developers.deepgram.com). > > [deepgram/deepgram-python-sdk](https://github.com/deepgram/deepgram-python-sdk) — official SDK --- ## 快速使用 1. `pip install deepgram-sdk`,在 console.deepgram.com 拿 DEEPGRAM_API_KEY 2. 流式:`dg.listen.asyncwebsocket.v('1')` + `LiveOptions(model='nova-3')` 3. 批量:`dg.listen.prerecorded.v('1').transcribe_file({'buffer':...}, PrerecordedOptions(model='nova-3'))` --- ## 简介 Nova-3 是 Deepgram 最新生产 STT —— 部分结果延迟 60ms、最终结果 <300ms、36 语言、自动标点、智能格式化、脏话过滤、自定义词表。英文呼叫中心转录和语音 agent 的事实默认。适合电话 agent、会议录音、直播字幕、延迟主导体验的语音控制应用。兼容 Deepgram Python/JS/Go/Rust SDK、REST、WebSocket 流式、OpenAI 兼容音频 endpoint。装机时间 5 分钟。 --- ### 流式 STT(Python) ```python import asyncio from deepgram import DeepgramClient, LiveTranscriptionEvents, LiveOptions dg = DeepgramClient(os.environ["DEEPGRAM_API_KEY"]) async def transcribe_mic(): connection = dg.listen.asyncwebsocket.v("1") async def on_message(_, result, **kwargs): sentence = result.channel.alternatives[0].transcript if not sentence: return if result.is_final: print(f"最终:{sentence}") else: print(f"中间:{sentence}", end="\r") connection.on(LiveTranscriptionEvents.Transcript, on_message) options = LiveOptions( model="nova-3", language="multi", # 或 "en" / "es" / "fr" 等 smart_format=True, interim_results=True, utterance_end_ms="1000", vad_events=True, ) await connection.start(options) # 从麦克风喂音频字节 async for audio_chunk in mic_audio_iterator(): await connection.send(audio_chunk) await connection.finish() asyncio.run(transcribe_mic()) ``` ### 批量转录(文件) ```python from deepgram import PrerecordedOptions with open("call.mp3", "rb") as f: response = dg.listen.prerecorded.v("1").transcribe_file( {"buffer": f.read()}, PrerecordedOptions( model="nova-3", smart_format=True, diarize=True, punctuate=True, paragraphs=True, summarize="v2", detect_topics=True, ), ) print(response.results.channels[0].alternatives[0].transcript) ``` ### OpenAI 兼容 endpoint ```python from openai import OpenAI client = OpenAI( base_url="https://api.deepgram.com/v1", api_key=os.environ["DEEPGRAM_API_KEY"], ) transcript = client.audio.transcriptions.create( model="nova-3", file=open("audio.mp3", "rb"), ) ``` ### 跟同行延迟对比(p50 流式部分) | 提供商 | 部分延迟 | |---|---| | **Deepgram Nova-3** | **~60ms** | | AssemblyAI Universal-2 | ~150-300ms | | Groq Whisper Turbo | ~200ms | | OpenAI Whisper-1 | ~600ms(仅批量)| ### 价格(2026 年 5 月) - 流式 Nova-3:$0.0058/分钟 - 批量 Nova-3:$0.0043/分钟 - 注册赠 $200 credit --- ### FAQ **Q: Deepgram Nova-3 vs Groq Whisper vs AssemblyAI?** A: Deepgram 部分延迟低 90+ms —— 英文语音 agent 和呼叫中心赢。Groq 上的 Whisper 低资源语言覆盖更广。AssemblyAI 分离更好且自带 LeMUR 跑转录 LLM。按主任务选。 **Q: 自定义词表给产品名?** A: 支持 —— LiveOptions 里传 `keywords=['TokRepo', 'GEOScore', 'KeepRule']`。Deepgram 解码时给这些 token 加权,让品牌名转得对。最多约 100 keyword 效果最佳。 **Q: 8kHz 电话音频准确度?** A: 极好 —— Nova-3 重训了电话数据。Twilio Media Stream 设 `encoding='mulaw', sample_rate=8000`。立体声每声道(caller/callee 在不同声道)打到 ~99% 分离。 --- ## 来源与感谢 > Built by [Deepgram](https://github.com/deepgram). API docs at [developers.deepgram.com](https://developers.deepgram.com). > > [deepgram/deepgram-python-sdk](https://github.com/deepgram/deepgram-python-sdk) — official SDK --- Source: https://tokrepo.com/en/workflows/deepgram-nova-3-production-stt-with-60ms-partial-latency Author: Deepgram