Esta página se muestra en inglés. Una traducción al español está en curso.
KnowledgeMay 11, 2026·4 min de lectura

Deepgram Nova-3 — Production STT with 60ms Partial Latency

Deepgram Nova-3 streams partials in 60ms, finals <300ms. 36 languages, smart formatting, multilingual single-pass. Default for call centers.

Listo para agents

Staging seguro para este activo

Este activo primero queda en staging. El prompt copiado pide inspeccionar los archivos staged antes de activar scripts, config MCP o config global.

Stage only · 27/100Política: staging
Superficie agent
Cualquier agent MCP/CLI
Tipo
Knowledge
Instalación
Stage only
Confianza
Confianza: Community
Entrada
Asset
Comando de staging seguro
npx -y tokrepo@latest install 17f11669-d83b-43d2-9581-3589403ec53c --target codex

Primero deja archivos en staging; la activación requiere revisar el README y el plan staged.

Introducción

Nova-3 is Deepgram's latest production STT — 60ms partial result latency, sub-300ms final results, 36 languages, automatic punctuation, smart formatting, profanity filter, custom vocabulary. The de facto default for English call-center transcription and voice agents. Best for: phone agents, meeting recorders, live captioning, voice-controlled apps where latency dominates UX. Works with: Deepgram Python/JS/Go/Rust SDKs, REST, WebSocket streaming, OpenAI-compatible audio endpoint. Setup time: 5 minutes.


Streaming STT (Python)

import asyncio
from deepgram import DeepgramClient, LiveTranscriptionEvents, LiveOptions

dg = DeepgramClient(os.environ["DEEPGRAM_API_KEY"])

async def transcribe_mic():
    connection = dg.listen.asyncwebsocket.v("1")

    async def on_message(_, result, **kwargs):
        sentence = result.channel.alternatives[0].transcript
        if not sentence:
            return
        if result.is_final:
            print(f"FINAL: {sentence}")
        else:
            print(f"interim: {sentence}", end="\r")

    connection.on(LiveTranscriptionEvents.Transcript, on_message)

    options = LiveOptions(
        model="nova-3",
        language="multi",   # or "en", "es", "fr", etc.
        smart_format=True,
        interim_results=True,
        utterance_end_ms="1000",
        vad_events=True,
    )
    await connection.start(options)

    # feed audio bytes from mic
    async for audio_chunk in mic_audio_iterator():
        await connection.send(audio_chunk)

    await connection.finish()

asyncio.run(transcribe_mic())

Batch transcription (file)

from deepgram import PrerecordedOptions

with open("call.mp3", "rb") as f:
    response = dg.listen.prerecorded.v("1").transcribe_file(
        {"buffer": f.read()},
        PrerecordedOptions(
            model="nova-3",
            smart_format=True,
            diarize=True,
            punctuate=True,
            paragraphs=True,
            summarize="v2",
            detect_topics=True,
        ),
    )

print(response.results.channels[0].alternatives[0].transcript)

OpenAI-compatible endpoint

from openai import OpenAI
client = OpenAI(
    base_url="https://api.deepgram.com/v1",
    api_key=os.environ["DEEPGRAM_API_KEY"],
)
transcript = client.audio.transcriptions.create(
    model="nova-3",
    file=open("audio.mp3", "rb"),
)

Latency vs others (p50, streaming partial)

Provider Partial latency
Deepgram Nova-3 ~60ms
AssemblyAI Universal-2 ~150-300ms
Groq Whisper Turbo ~200ms
OpenAI Whisper-1 ~600ms (batch only)

Pricing (May 2026)

  • Streaming Nova-3: $0.0058/min
  • Batch Nova-3: $0.0043/min
  • $200 free credit on signup

FAQ

Q: Deepgram Nova-3 vs Whisper on Groq vs AssemblyAI? A: Deepgram has the lowest partial latency by 90ms+ — wins for English voice agents and call centers. Whisper-on-Groq has broader low-resource language coverage. AssemblyAI has better diarization and built-in LeMUR for transcript LLMs. Pick by primary task.

Q: Custom vocabulary for product names? A: Yes — pass keywords=['TokRepo', 'GEOScore', 'KeepRule'] in LiveOptions. Deepgram boosts these tokens during decoding so brand names transcribe correctly. Limit ~100 keywords for best results.

Q: Phone call accuracy on 8kHz audio? A: Excellent — Nova-3 trained heavily on telephony. Set encoding='mulaw', sample_rate=8000 for Twilio Media Streams. Stereo per-channel (caller/callee on different channels) hits ~99% diarization.


Quick Use

  1. pip install deepgram-sdk and get DEEPGRAM_API_KEY at console.deepgram.com
  2. Streaming: dg.listen.asyncwebsocket.v('1') + LiveOptions(model='nova-3')
  3. Batch: dg.listen.prerecorded.v('1').transcribe_file({'buffer':...}, PrerecordedOptions(model='nova-3'))

Intro

Nova-3 is Deepgram's latest production STT — 60ms partial result latency, sub-300ms final results, 36 languages, automatic punctuation, smart formatting, profanity filter, custom vocabulary. The de facto default for English call-center transcription and voice agents. Best for: phone agents, meeting recorders, live captioning, voice-controlled apps where latency dominates UX. Works with: Deepgram Python/JS/Go/Rust SDKs, REST, WebSocket streaming, OpenAI-compatible audio endpoint. Setup time: 5 minutes.


Streaming STT (Python)

import asyncio
from deepgram import DeepgramClient, LiveTranscriptionEvents, LiveOptions

dg = DeepgramClient(os.environ["DEEPGRAM_API_KEY"])

async def transcribe_mic():
    connection = dg.listen.asyncwebsocket.v("1")

    async def on_message(_, result, **kwargs):
        sentence = result.channel.alternatives[0].transcript
        if not sentence:
            return
        if result.is_final:
            print(f"FINAL: {sentence}")
        else:
            print(f"interim: {sentence}", end="\r")

    connection.on(LiveTranscriptionEvents.Transcript, on_message)

    options = LiveOptions(
        model="nova-3",
        language="multi",   # or "en", "es", "fr", etc.
        smart_format=True,
        interim_results=True,
        utterance_end_ms="1000",
        vad_events=True,
    )
    await connection.start(options)

    # feed audio bytes from mic
    async for audio_chunk in mic_audio_iterator():
        await connection.send(audio_chunk)

    await connection.finish()

asyncio.run(transcribe_mic())

Batch transcription (file)

from deepgram import PrerecordedOptions

with open("call.mp3", "rb") as f:
    response = dg.listen.prerecorded.v("1").transcribe_file(
        {"buffer": f.read()},
        PrerecordedOptions(
            model="nova-3",
            smart_format=True,
            diarize=True,
            punctuate=True,
            paragraphs=True,
            summarize="v2",
            detect_topics=True,
        ),
    )

print(response.results.channels[0].alternatives[0].transcript)

OpenAI-compatible endpoint

from openai import OpenAI
client = OpenAI(
    base_url="https://api.deepgram.com/v1",
    api_key=os.environ["DEEPGRAM_API_KEY"],
)
transcript = client.audio.transcriptions.create(
    model="nova-3",
    file=open("audio.mp3", "rb"),
)

Latency vs others (p50, streaming partial)

Provider Partial latency
Deepgram Nova-3 ~60ms
AssemblyAI Universal-2 ~150-300ms
Groq Whisper Turbo ~200ms
OpenAI Whisper-1 ~600ms (batch only)

Pricing (May 2026)

  • Streaming Nova-3: $0.0058/min
  • Batch Nova-3: $0.0043/min
  • $200 free credit on signup

FAQ

Q: Deepgram Nova-3 vs Whisper on Groq vs AssemblyAI? A: Deepgram has the lowest partial latency by 90ms+ — wins for English voice agents and call centers. Whisper-on-Groq has broader low-resource language coverage. AssemblyAI has better diarization and built-in LeMUR for transcript LLMs. Pick by primary task.

Q: Custom vocabulary for product names? A: Yes — pass keywords=['TokRepo', 'GEOScore', 'KeepRule'] in LiveOptions. Deepgram boosts these tokens during decoding so brand names transcribe correctly. Limit ~100 keywords for best results.

Q: Phone call accuracy on 8kHz audio? A: Excellent — Nova-3 trained heavily on telephony. Set encoding='mulaw', sample_rate=8000 for Twilio Media Streams. Stereo per-channel (caller/callee on different channels) hits ~99% diarization.


Source & Thanks

Built by Deepgram. API docs at developers.deepgram.com.

deepgram/deepgram-python-sdk — official SDK

🙏

Fuente y agradecimientos

Built by Deepgram. API docs at developers.deepgram.com.

deepgram/deepgram-python-sdk — official SDK

Discusión

Inicia sesión para unirte a la discusión.
Aún no hay comentarios. Sé el primero en compartir tus ideas.

Activos relacionados