# Deepgram Nova-3 — Production STT with 60ms Partial Latency

> Deepgram Nova-3 streams partials in 60ms, finals <300ms. 36 languages, smart formatting, multilingual single-pass. Default for call centers.

## Install

Copy the content below into your project:

## Quick Use

1. `pip install deepgram-sdk` and get DEEPGRAM_API_KEY at console.deepgram.com
2. Streaming: `dg.listen.asyncwebsocket.v('1')` + `LiveOptions(model='nova-3')`
3. Batch: `dg.listen.prerecorded.v('1').transcribe_file({'buffer':...}, PrerecordedOptions(model='nova-3'))`

---

## Intro

Nova-3 is Deepgram's latest production STT — 60ms partial result latency, sub-300ms final results, 36 languages, automatic punctuation, smart formatting, profanity filter, custom vocabulary. The de facto default for English call-center transcription and voice agents. Best for: phone agents, meeting recorders, live captioning, voice-controlled apps where latency dominates UX. Works with: Deepgram Python/JS/Go/Rust SDKs, REST, WebSocket streaming, OpenAI-compatible audio endpoint. Setup time: 5 minutes.

---

### Streaming STT (Python)

```python
import asyncio
from deepgram import DeepgramClient, LiveTranscriptionEvents, LiveOptions

dg = DeepgramClient(os.environ["DEEPGRAM_API_KEY"])

async def transcribe_mic():
    connection = dg.listen.asyncwebsocket.v("1")

    async def on_message(_, result, **kwargs):
        sentence = result.channel.alternatives[0].transcript
        if not sentence:
            return
        if result.is_final:
            print(f"FINAL: {sentence}")
        else:
            print(f"interim: {sentence}", end="\r")

    connection.on(LiveTranscriptionEvents.Transcript, on_message)

    options = LiveOptions(
        model="nova-3",
        language="multi",   # or "en", "es", "fr", etc.
        smart_format=True,
        interim_results=True,
        utterance_end_ms="1000",
        vad_events=True,
    )
    await connection.start(options)

    # feed audio bytes from mic
    async for audio_chunk in mic_audio_iterator():
        await connection.send(audio_chunk)

    await connection.finish()

asyncio.run(transcribe_mic())
```

### Batch transcription (file)

```python
from deepgram import PrerecordedOptions

with open("call.mp3", "rb") as f:
    response = dg.listen.prerecorded.v("1").transcribe_file(
        {"buffer": f.read()},
        PrerecordedOptions(
            model="nova-3",
            smart_format=True,
            diarize=True,
            punctuate=True,
            paragraphs=True,
            summarize="v2",
            detect_topics=True,
        ),
    )

print(response.results.channels[0].alternatives[0].transcript)
```

### OpenAI-compatible endpoint

```python
from openai import OpenAI
client = OpenAI(
    base_url="https://api.deepgram.com/v1",
    api_key=os.environ["DEEPGRAM_API_KEY"],
)
transcript = client.audio.transcriptions.create(
    model="nova-3",
    file=open("audio.mp3", "rb"),
)
```

### Latency vs others (p50, streaming partial)

| Provider | Partial latency |
|---|---|
| **Deepgram Nova-3** | **~60ms** |
| AssemblyAI Universal-2 | ~150-300ms |
| Groq Whisper Turbo | ~200ms |
| OpenAI Whisper-1 | ~600ms (batch only) |

### Pricing (May 2026)

- Streaming Nova-3: $0.0058/min
- Batch Nova-3: $0.0043/min
- $200 free credit on signup

---

### FAQ

**Q: Deepgram Nova-3 vs Whisper on Groq vs AssemblyAI?**
A: Deepgram has the lowest partial latency by 90ms+ — wins for English voice agents and call centers. Whisper-on-Groq has broader low-resource language coverage. AssemblyAI has better diarization and built-in LeMUR for transcript LLMs. Pick by primary task.

**Q: Custom vocabulary for product names?**
A: Yes — pass `keywords=['TokRepo', 'GEOScore', 'KeepRule']` in LiveOptions. Deepgram boosts these tokens during decoding so brand names transcribe correctly. Limit ~100 keywords for best results.

**Q: Phone call accuracy on 8kHz audio?**
A: Excellent — Nova-3 trained heavily on telephony. Set `encoding='mulaw', sample_rate=8000` for Twilio Media Streams. Stereo per-channel (caller/callee on different channels) hits ~99% diarization.

---

## Source & Thanks

> Built by [Deepgram](https://github.com/deepgram). API docs at [developers.deepgram.com](https://developers.deepgram.com).
>
> [deepgram/deepgram-python-sdk](https://github.com/deepgram/deepgram-python-sdk) — official SDK

---

<!-- ZH -->

## 快速使用

1. `pip install deepgram-sdk`，在 console.deepgram.com 拿 DEEPGRAM_API_KEY
2. 流式：`dg.listen.asyncwebsocket.v('1')` + `LiveOptions(model='nova-3')`
3. 批量：`dg.listen.prerecorded.v('1').transcribe_file({'buffer':...}, PrerecordedOptions(model='nova-3'))`

---

## 简介

Nova-3 是 Deepgram 最新生产 STT —— 部分结果延迟 60ms、最终结果 <300ms、36 语言、自动标点、智能格式化、脏话过滤、自定义词表。英文呼叫中心转录和语音 agent 的事实默认。适合电话 agent、会议录音、直播字幕、延迟主导体验的语音控制应用。兼容 Deepgram Python/JS/Go/Rust SDK、REST、WebSocket 流式、OpenAI 兼容音频 endpoint。装机时间 5 分钟。

---

### 流式 STT（Python）

```python
import asyncio
from deepgram import DeepgramClient, LiveTranscriptionEvents, LiveOptions

dg = DeepgramClient(os.environ["DEEPGRAM_API_KEY"])

async def transcribe_mic():
    connection = dg.listen.asyncwebsocket.v("1")

    async def on_message(_, result, **kwargs):
        sentence = result.channel.alternatives[0].transcript
        if not sentence:
            return
        if result.is_final:
            print(f"最终：{sentence}")
        else:
            print(f"中间：{sentence}", end="\r")

    connection.on(LiveTranscriptionEvents.Transcript, on_message)

    options = LiveOptions(
        model="nova-3",
        language="multi",   # 或 "en" / "es" / "fr" 等
        smart_format=True,
        interim_results=True,
        utterance_end_ms="1000",
        vad_events=True,
    )
    await connection.start(options)

    # 从麦克风喂音频字节
    async for audio_chunk in mic_audio_iterator():
        await connection.send(audio_chunk)

    await connection.finish()

asyncio.run(transcribe_mic())
```

### 批量转录（文件）

```python
from deepgram import PrerecordedOptions

with open("call.mp3", "rb") as f:
    response = dg.listen.prerecorded.v("1").transcribe_file(
        {"buffer": f.read()},
        PrerecordedOptions(
            model="nova-3",
            smart_format=True,
            diarize=True,
            punctuate=True,
            paragraphs=True,
            summarize="v2",
            detect_topics=True,
        ),
    )

print(response.results.channels[0].alternatives[0].transcript)
```

### OpenAI 兼容 endpoint

```python
from openai import OpenAI
client = OpenAI(
    base_url="https://api.deepgram.com/v1",
    api_key=os.environ["DEEPGRAM_API_KEY"],
)
transcript = client.audio.transcriptions.create(
    model="nova-3",
    file=open("audio.mp3", "rb"),
)
```

### 跟同行延迟对比（p50 流式部分）

| 提供商 | 部分延迟 |
|---|---|
| **Deepgram Nova-3** | **~60ms** |
| AssemblyAI Universal-2 | ~150-300ms |
| Groq Whisper Turbo | ~200ms |
| OpenAI Whisper-1 | ~600ms（仅批量）|

### 价格（2026 年 5 月）

- 流式 Nova-3：$0.0058/分钟
- 批量 Nova-3：$0.0043/分钟
- 注册赠 $200 credit

---

### FAQ

**Q: Deepgram Nova-3 vs Groq Whisper vs AssemblyAI？**
A: Deepgram 部分延迟低 90+ms —— 英文语音 agent 和呼叫中心赢。Groq 上的 Whisper 低资源语言覆盖更广。AssemblyAI 分离更好且自带 LeMUR 跑转录 LLM。按主任务选。

**Q: 自定义词表给产品名？**
A: 支持 —— LiveOptions 里传 `keywords=['TokRepo', 'GEOScore', 'KeepRule']`。Deepgram 解码时给这些 token 加权，让品牌名转得对。最多约 100 keyword 效果最佳。

**Q: 8kHz 电话音频准确度？**
A: 极好 —— Nova-3 重训了电话数据。Twilio Media Stream 设 `encoding='mulaw', sample_rate=8000`。立体声每声道（caller/callee 在不同声道）打到 ~99% 分离。

---

## 来源与感谢

> Built by [Deepgram](https://github.com/deepgram). API docs at [developers.deepgram.com](https://developers.deepgram.com).
>
> [deepgram/deepgram-python-sdk](https://github.com/deepgram/deepgram-python-sdk) — official SDK


---
Source: https://tokrepo.com/en/workflows/deepgram-nova-3-production-stt-with-60ms-partial-latency
Author: Deepgram