Cette page est affichée en anglais. Une traduction française est en cours.
KnowledgeMay 11, 2026·4 min de lecture

Deepgram Nova-3 — Production STT with 60ms Partial Latency

Deepgram Nova-3 streams partials in 60ms, finals <300ms. 36 languages, smart formatting, multilingual single-pass. Default for call centers.

Deepgram
Deepgram · Community
Prêt pour agents

Cet actif peut être lu et installé directement par les agents

TokRepo expose une commande CLI universelle, un contrat d'installation, le metadata JSON, un plan selon l'adaptateur et le contenu raw pour aider les agents à juger l'adaptation, le risque et les prochaines actions.

Stage only · 15/100Stage only
Surface agent
Tout agent MCP/CLI
Type
Knowledge
Installation
Stage only
Confiance
Confiance : New
Point d'entrée
Asset
Commande CLI universelle
npx tokrepo install 17f11669-d83b-43d2-9581-3589403ec53c
Introduction

Nova-3 is Deepgram's latest production STT — 60ms partial result latency, sub-300ms final results, 36 languages, automatic punctuation, smart formatting, profanity filter, custom vocabulary. The de facto default for English call-center transcription and voice agents. Best for: phone agents, meeting recorders, live captioning, voice-controlled apps where latency dominates UX. Works with: Deepgram Python/JS/Go/Rust SDKs, REST, WebSocket streaming, OpenAI-compatible audio endpoint. Setup time: 5 minutes.


Streaming STT (Python)

import asyncio
from deepgram import DeepgramClient, LiveTranscriptionEvents, LiveOptions

dg = DeepgramClient(os.environ["DEEPGRAM_API_KEY"])

async def transcribe_mic():
    connection = dg.listen.asyncwebsocket.v("1")

    async def on_message(_, result, **kwargs):
        sentence = result.channel.alternatives[0].transcript
        if not sentence:
            return
        if result.is_final:
            print(f"FINAL: {sentence}")
        else:
            print(f"interim: {sentence}", end="\r")

    connection.on(LiveTranscriptionEvents.Transcript, on_message)

    options = LiveOptions(
        model="nova-3",
        language="multi",   # or "en", "es", "fr", etc.
        smart_format=True,
        interim_results=True,
        utterance_end_ms="1000",
        vad_events=True,
    )
    await connection.start(options)

    # feed audio bytes from mic
    async for audio_chunk in mic_audio_iterator():
        await connection.send(audio_chunk)

    await connection.finish()

asyncio.run(transcribe_mic())

Batch transcription (file)

from deepgram import PrerecordedOptions

with open("call.mp3", "rb") as f:
    response = dg.listen.prerecorded.v("1").transcribe_file(
        {"buffer": f.read()},
        PrerecordedOptions(
            model="nova-3",
            smart_format=True,
            diarize=True,
            punctuate=True,
            paragraphs=True,
            summarize="v2",
            detect_topics=True,
        ),
    )

print(response.results.channels[0].alternatives[0].transcript)

OpenAI-compatible endpoint

from openai import OpenAI
client = OpenAI(
    base_url="https://api.deepgram.com/v1",
    api_key=os.environ["DEEPGRAM_API_KEY"],
)
transcript = client.audio.transcriptions.create(
    model="nova-3",
    file=open("audio.mp3", "rb"),
)

Latency vs others (p50, streaming partial)

Provider Partial latency
Deepgram Nova-3 ~60ms
AssemblyAI Universal-2 ~150-300ms
Groq Whisper Turbo ~200ms
OpenAI Whisper-1 ~600ms (batch only)

Pricing (May 2026)

  • Streaming Nova-3: $0.0058/min
  • Batch Nova-3: $0.0043/min
  • $200 free credit on signup

FAQ

Q: Deepgram Nova-3 vs Whisper on Groq vs AssemblyAI? A: Deepgram has the lowest partial latency by 90ms+ — wins for English voice agents and call centers. Whisper-on-Groq has broader low-resource language coverage. AssemblyAI has better diarization and built-in LeMUR for transcript LLMs. Pick by primary task.

Q: Custom vocabulary for product names? A: Yes — pass keywords=['TokRepo', 'GEOScore', 'KeepRule'] in LiveOptions. Deepgram boosts these tokens during decoding so brand names transcribe correctly. Limit ~100 keywords for best results.

Q: Phone call accuracy on 8kHz audio? A: Excellent — Nova-3 trained heavily on telephony. Set encoding='mulaw', sample_rate=8000 for Twilio Media Streams. Stereo per-channel (caller/callee on different channels) hits ~99% diarization.


Quick Use

  1. pip install deepgram-sdk and get DEEPGRAM_API_KEY at console.deepgram.com
  2. Streaming: dg.listen.asyncwebsocket.v('1') + LiveOptions(model='nova-3')
  3. Batch: dg.listen.prerecorded.v('1').transcribe_file({'buffer':...}, PrerecordedOptions(model='nova-3'))

Intro

Nova-3 is Deepgram's latest production STT — 60ms partial result latency, sub-300ms final results, 36 languages, automatic punctuation, smart formatting, profanity filter, custom vocabulary. The de facto default for English call-center transcription and voice agents. Best for: phone agents, meeting recorders, live captioning, voice-controlled apps where latency dominates UX. Works with: Deepgram Python/JS/Go/Rust SDKs, REST, WebSocket streaming, OpenAI-compatible audio endpoint. Setup time: 5 minutes.


Streaming STT (Python)

import asyncio
from deepgram import DeepgramClient, LiveTranscriptionEvents, LiveOptions

dg = DeepgramClient(os.environ["DEEPGRAM_API_KEY"])

async def transcribe_mic():
    connection = dg.listen.asyncwebsocket.v("1")

    async def on_message(_, result, **kwargs):
        sentence = result.channel.alternatives[0].transcript
        if not sentence:
            return
        if result.is_final:
            print(f"FINAL: {sentence}")
        else:
            print(f"interim: {sentence}", end="\r")

    connection.on(LiveTranscriptionEvents.Transcript, on_message)

    options = LiveOptions(
        model="nova-3",
        language="multi",   # or "en", "es", "fr", etc.
        smart_format=True,
        interim_results=True,
        utterance_end_ms="1000",
        vad_events=True,
    )
    await connection.start(options)

    # feed audio bytes from mic
    async for audio_chunk in mic_audio_iterator():
        await connection.send(audio_chunk)

    await connection.finish()

asyncio.run(transcribe_mic())

Batch transcription (file)

from deepgram import PrerecordedOptions

with open("call.mp3", "rb") as f:
    response = dg.listen.prerecorded.v("1").transcribe_file(
        {"buffer": f.read()},
        PrerecordedOptions(
            model="nova-3",
            smart_format=True,
            diarize=True,
            punctuate=True,
            paragraphs=True,
            summarize="v2",
            detect_topics=True,
        ),
    )

print(response.results.channels[0].alternatives[0].transcript)

OpenAI-compatible endpoint

from openai import OpenAI
client = OpenAI(
    base_url="https://api.deepgram.com/v1",
    api_key=os.environ["DEEPGRAM_API_KEY"],
)
transcript = client.audio.transcriptions.create(
    model="nova-3",
    file=open("audio.mp3", "rb"),
)

Latency vs others (p50, streaming partial)

Provider Partial latency
Deepgram Nova-3 ~60ms
AssemblyAI Universal-2 ~150-300ms
Groq Whisper Turbo ~200ms
OpenAI Whisper-1 ~600ms (batch only)

Pricing (May 2026)

  • Streaming Nova-3: $0.0058/min
  • Batch Nova-3: $0.0043/min
  • $200 free credit on signup

FAQ

Q: Deepgram Nova-3 vs Whisper on Groq vs AssemblyAI? A: Deepgram has the lowest partial latency by 90ms+ — wins for English voice agents and call centers. Whisper-on-Groq has broader low-resource language coverage. AssemblyAI has better diarization and built-in LeMUR for transcript LLMs. Pick by primary task.

Q: Custom vocabulary for product names? A: Yes — pass keywords=['TokRepo', 'GEOScore', 'KeepRule'] in LiveOptions. Deepgram boosts these tokens during decoding so brand names transcribe correctly. Limit ~100 keywords for best results.

Q: Phone call accuracy on 8kHz audio? A: Excellent — Nova-3 trained heavily on telephony. Set encoding='mulaw', sample_rate=8000 for Twilio Media Streams. Stereo per-channel (caller/callee on different channels) hits ~99% diarization.


Source & Thanks

Built by Deepgram. API docs at developers.deepgram.com.

deepgram/deepgram-python-sdk — official SDK

🙏

Source et remerciements

Built by Deepgram. API docs at developers.deepgram.com.

deepgram/deepgram-python-sdk — official SDK

Fil de discussion

Connectez-vous pour rejoindre la discussion.
Aucun commentaire pour l'instant. Soyez le premier à partager votre avis.

Actifs similaires