Use when writing or reviewing Python code in this repo that calls Deepgram audio analytics overlays on `/v1/listen` - summarize, topics, intents, sentiment, diarize, redact, detect_language, entity detection. Same endpoint as plain STT but with analytics params. Covers both REST (`client.listen.v1.media.transcribe_url`/`transcribe_file`) and the WSS-supported subset (`client.listen.v1.connect`). Use `deepgram-python-speech-to-text` for plain transcription, `deepgram-python-text-intelligence` for analytics on already-transcribed text. Triggers include "diarize", "summarize audio", "sentiment from audio", "redact PII", "topic detection audio", "audio intelligence", "detect language audio".
86
82%
Does it follow best practices?
Impact
Pending
No eval scenarios have been run
Advisory
Suggest reviewing before use
Analytics overlays applied to /v1/listen transcription: summarize, topics, intents, sentiment, language detection, diarization, redaction, entities. Same endpoint / same client methods as STT — enable features via params.
Use a different skill when:
deepgram-python-speech-to-text.deepgram-python-text-intelligence (/v1/read).deepgram-python-conversational-stt.deepgram-python-voice-agent.| Feature | REST | WSS |
|---|---|---|
diarize | yes | yes |
redact | yes | yes |
punctuate, smart_format | yes | yes |
| Entity detection | yes | yes |
summarize | yes | no |
topics | yes | no |
intents | yes | no |
sentiment | yes | no |
detect_language | yes | no |
custom_topic / custom_intent | yes | no |
For the WSS-only subset, same code path as deepgram-python-speech-to-text.
from dotenv import load_dotenv
load_dotenv()
from deepgram import DeepgramClient
client = DeepgramClient()Header: Authorization: Token <api_key>.
response = client.listen.v1.media.transcribe_url(
url="https://dpgr.am/spacewalk.wav",
model="nova-3",
smart_format=True,
punctuate=True,
diarize=True, # speaker separation
summarize="v2", # "v2" for the current model; True also accepted on /v1/listen
topics=True,
intents=True,
sentiment=True,
detect_language=True,
redact=["pci", "pii"], # or Sequence[str]
language="en-US",
)
r = response.results
print("transcript:", r.channels[0].alternatives[0].transcript)
print("summary:", r.summary)
print("topics:", r.topics)
print("intents:", r.intents)
print("sentiments:", r.sentiments)
print("detected_language:", r.channels[0].detected_language)
# Speaker diarization
for word in r.channels[0].alternatives[0].words or []:
speaker = getattr(word, "speaker", None)
if speaker is not None:
print(f"Speaker {speaker}: {word.word}")with open("call.wav", "rb") as f:
audio = f.read()
response = client.listen.v1.media.transcribe_file(
request=audio,
model="nova-3",
diarize=True,
redact=["pii"],
summarize="v2",
topics=True,
)Enable speaker separation and word-level timestamps in a single request, then iterate the per-word objects to build a speaker-labelled transcript with timing.
response = client.listen.v1.media.transcribe_url(
url="https://dpgr.am/spacewalk.wav",
model="nova-3",
diarize=True, # tag each word with a speaker id
smart_format=True, # punctuated_word for cleaner output
punctuate=True,
)
words = response.results.channels[0].alternatives[0].words or []
# Per-word: speaker, timestamps, confidence
for w in words:
speaker = getattr(w, "speaker", None)
text = w.punctuated_word or w.word
print(f"[speaker {speaker}] {text} ({w.start:.2f}s–{w.end:.2f}s, conf={w.confidence:.2f})")
# Group consecutive words by speaker into utterances
from itertools import groupby
for speaker, group in groupby(words, key=lambda w: getattr(w, "speaker", None)):
text = " ".join((w.punctuated_word or w.word) for w in group)
print(f"Speaker {speaker}: {text}")Per-word fields available on each entry:
| Field | Type | Description |
|---|---|---|
word | str | Lowercase token |
punctuated_word | str | None | Token with smart-formatted casing/punctuation (when smart_format=True) |
start, end | float | Audio timestamps in seconds |
confidence | float | 0.0–1.0 confidence |
speaker | int | None | Speaker id (when diarize=True); None if diarization disabled |
speaker_confidence | float | None | Speaker-id confidence |
For a higher-level breakdown, set utterances=True to get pre-grouped speaker turns at response.results.utterances. Set paragraphs=True for a paragraphs view organised by speaker turn boundaries.
import threading
from deepgram.core.events import EventType
with client.listen.v1.connect(model="nova-3", diarize=True, redact=["pii"]) as conn:
conn.on(EventType.MESSAGE, lambda m: print(m))
threading.Thread(target=conn.start_listening, daemon=True).start()
for chunk in audio_chunks:
conn.send_media(chunk)
conn.send_finalize()summarize, topics, intents, sentiment, detect_language, diarize, redact, custom_topic, custom_topic_mode, custom_intent, custom_intent_mode, detect_entities, plus all the standard STT params (model, language, encoding, sample_rate, ...).
redact is typed as Optional[str] in the current generated SDK (src/deepgram/listen/v1/media/client.py). Pass a single redaction mode such as "pci", "pii", "numbers", or "phi". Multi-mode redaction at the transport level is supported by sending redact as a repeated query parameter — check src/deepgram/types/listen_v1redact.py for the current type and fall back to raw query-param construction (or multiple calls) if you need several modes. The earlier Union[str, Sequence[str]] override is no longer carried in .fernignore.
reference.md — "Listen V1 Media" (REST params include all analytics flags), "Listen V1 Connect" (WSS-supported subset)./llmstxt/developers_deepgram_llms_txt.summarize on /v1/listen accepts a boolean OR the string "v2". Use "v2" to pin the current summarization model; True also works (maps to the default model). /v1/read is the reverse — it accepts boolean only. If you need summarization on already-transcribed text, see deepgram-python-text-intelligence.nova-3 unless you have a reason.pci, pii, phi, numbers, etc. — not arbitrary strings.custom_topic / custom_intent need a mode ("extended" or "strict").examples/15-transcription-advanced-options.py — smart_format, punctuate, diarizetests/wire/test_listen_v1_media.py — wire test covering intelligence paramsdeepgram-python-speech-to-text — same endpoint, plain transcriptiondeepgram-python-text-intelligence — same analytics, text inputdeepgram-python-conversational-stt — Flux for turn-takingdeepgram-python-voice-agent — interactive assistantsFor cross-language Deepgram product knowledge — the consolidated API reference, documentation finder, focused runnable recipes, third-party integration examples, and MCP setup — install the central skills:
npx skills add deepgram/skillsThis SDK ships language-idiomatic code skills; deepgram/skills ships cross-language product knowledge (see api, docs, recipes, examples, starters, setup-mcp).
16b9839
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.