Quick Use
pip install deepgram-sdkand get DEEPGRAM_API_KEY at console.deepgram.com- Streaming:
dg.listen.asyncwebsocket.v('1')+LiveOptions(model='nova-3') - Batch:
dg.listen.prerecorded.v('1').transcribe_file({'buffer':...}, PrerecordedOptions(model='nova-3'))
Intro
Nova-3 is Deepgram's latest production STT — 60ms partial result latency, sub-300ms final results, 36 languages, automatic punctuation, smart formatting, profanity filter, custom vocabulary. The de facto default for English call-center transcription and voice agents. Best for: phone agents, meeting recorders, live captioning, voice-controlled apps where latency dominates UX. Works with: Deepgram Python/JS/Go/Rust SDKs, REST, WebSocket streaming, OpenAI-compatible audio endpoint. Setup time: 5 minutes.
Streaming STT (Python)
import asyncio
from deepgram import DeepgramClient, LiveTranscriptionEvents, LiveOptions
dg = DeepgramClient(os.environ["DEEPGRAM_API_KEY"])
async def transcribe_mic():
connection = dg.listen.asyncwebsocket.v("1")
async def on_message(_, result, **kwargs):
sentence = result.channel.alternatives[0].transcript
if not sentence:
return
if result.is_final:
print(f"FINAL: {sentence}")
else:
print(f"interim: {sentence}", end="\r")
connection.on(LiveTranscriptionEvents.Transcript, on_message)
options = LiveOptions(
model="nova-3",
language="multi", # or "en", "es", "fr", etc.
smart_format=True,
interim_results=True,
utterance_end_ms="1000",
vad_events=True,
)
await connection.start(options)
# feed audio bytes from mic
async for audio_chunk in mic_audio_iterator():
await connection.send(audio_chunk)
await connection.finish()
asyncio.run(transcribe_mic())Batch transcription (file)
from deepgram import PrerecordedOptions
with open("call.mp3", "rb") as f:
response = dg.listen.prerecorded.v("1").transcribe_file(
{"buffer": f.read()},
PrerecordedOptions(
model="nova-3",
smart_format=True,
diarize=True,
punctuate=True,
paragraphs=True,
summarize="v2",
detect_topics=True,
),
)
print(response.results.channels[0].alternatives[0].transcript)OpenAI-compatible endpoint
from openai import OpenAI
client = OpenAI(
base_url="https://api.deepgram.com/v1",
api_key=os.environ["DEEPGRAM_API_KEY"],
)
transcript = client.audio.transcriptions.create(
model="nova-3",
file=open("audio.mp3", "rb"),
)Latency vs others (p50, streaming partial)
| Provider | Partial latency |
|---|---|
| Deepgram Nova-3 | ~60ms |
| AssemblyAI Universal-2 | ~150-300ms |
| Groq Whisper Turbo | ~200ms |
| OpenAI Whisper-1 | ~600ms (batch only) |
Pricing (May 2026)
- Streaming Nova-3: $0.0058/min
- Batch Nova-3: $0.0043/min
- $200 free credit on signup
FAQ
Q: Deepgram Nova-3 vs Whisper on Groq vs AssemblyAI? A: Deepgram has the lowest partial latency by 90ms+ — wins for English voice agents and call centers. Whisper-on-Groq has broader low-resource language coverage. AssemblyAI has better diarization and built-in LeMUR for transcript LLMs. Pick by primary task.
Q: Custom vocabulary for product names?
A: Yes — pass keywords=['TokRepo', 'GEOScore', 'KeepRule'] in LiveOptions. Deepgram boosts these tokens during decoding so brand names transcribe correctly. Limit ~100 keywords for best results.
Q: Phone call accuracy on 8kHz audio?
A: Excellent — Nova-3 trained heavily on telephony. Set encoding='mulaw', sample_rate=8000 for Twilio Media Streams. Stereo per-channel (caller/callee on different channels) hits ~99% diarization.
Source & Thanks
Built by Deepgram. API docs at developers.deepgram.com.
deepgram/deepgram-python-sdk — official SDK