deepgram-python-audio-intelligence

Use when writing or reviewing Python code in this repo that calls Deepgram audio analytics overlays on `/v1/listen` - summarize, topics, intents, sentiment, diarize, redact, detect_language, entity detection. Same endpoint as plain STT but with analytics params. Covers both REST (`client.listen.v1.media.transcribe_url`/`transcribe_file`) and the WSS-supported subset (`client.listen.v1.connect`). Use `deepgram-python-speech-to-text` for plain transcription, `deepgram-python-text-intelligence` for analytics on already-transcribed text. Triggers include "diarize", "summarize audio", "sentiment from audio", "redact PII", "topic detection audio", "audio intelligence", "detect language audio".

Quality

82%

Does it follow best practices?

Impact

—

No eval scenarios have been run

Securityby

Advisory

Suggest reviewing before use

Using Deepgram Audio Intelligence (Python SDK)

Name: deepgram-python-audio-intelligence
Rating: 68.8 (1 reviews)
Author: deepgram

Analytics overlays applied to /v1/listen transcription: summarize, topics, intents, sentiment, language detection, diarization, redaction, entities. Same endpoint / same client methods as STT — enable features via params.

When to use this product

You have audio (file, URL, or live stream) and want analytics alongside the transcript.
REST is the primary path — most analytics are REST-only.

Use a different skill when:

You want a pure transcript with no analytics → deepgram-python-speech-to-text.
Your input is already transcribed text → deepgram-python-text-intelligence (/v1/read).
You need conversational turn-taking → deepgram-python-conversational-stt.
You need a full interactive agent → deepgram-python-voice-agent.

Feature availability: REST vs WSS

Feature	REST	WSS
`diarize`	yes	yes
`redact`	yes	yes
`punctuate`, `smart_format`	yes	yes
Entity detection	yes	yes
`summarize`	yes	no
`topics`	yes	no
`intents`	yes	no
`sentiment`	yes	no
`detect_language`	yes	no
`custom_topic` / `custom_intent`	yes	no

For the WSS-only subset, same code path as deepgram-python-speech-to-text.

Authentication

from dotenv import load_dotenv
load_dotenv()

from deepgram import DeepgramClient
client = DeepgramClient()

Header: Authorization: Token <api_key>.

Quick start — REST with full analytics

response = client.listen.v1.media.transcribe_url(
    url="https://dpgr.am/spacewalk.wav",
    model="nova-3",
    smart_format=True,
    punctuate=True,
    diarize=True,              # speaker separation
    summarize="v2",            # "v2" for the current model; True also accepted on /v1/listen
    topics=True,
    intents=True,
    sentiment=True,
    detect_language=True,
    redact=["pci", "pii"],     # or Sequence[str]
    language="en-US",
)

r = response.results
print("transcript:", r.channels[0].alternatives[0].transcript)
print("summary:",    r.summary)
print("topics:",     r.topics)
print("intents:",    r.intents)
print("sentiments:", r.sentiments)
print("detected_language:", r.channels[0].detected_language)

# Speaker diarization
for word in r.channels[0].alternatives[0].words or []:
    speaker = getattr(word, "speaker", None)
    if speaker is not None:
        print(f"Speaker {speaker}: {word.word}")

Quick start — REST file

with open("call.wav", "rb") as f:
    audio = f.read()

response = client.listen.v1.media.transcribe_file(
    request=audio,
    model="nova-3",
    diarize=True,
    redact=["pii"],
    summarize="v2",
    topics=True,
)

Quick start — diarization with word-level timings

Enable speaker separation and word-level timestamps in a single request, then iterate the per-word objects to build a speaker-labelled transcript with timing.

response = client.listen.v1.media.transcribe_url(
    url="https://dpgr.am/spacewalk.wav",
    model="nova-3",
    diarize=True,        # tag each word with a speaker id
    smart_format=True,   # punctuated_word for cleaner output
    punctuate=True,
)

words = response.results.channels[0].alternatives[0].words or []

# Per-word: speaker, timestamps, confidence
for w in words:
    speaker = getattr(w, "speaker", None)
    text = w.punctuated_word or w.word
    print(f"[speaker {speaker}] {text}  ({w.start:.2f}s–{w.end:.2f}s, conf={w.confidence:.2f})")

# Group consecutive words by speaker into utterances
from itertools import groupby
for speaker, group in groupby(words, key=lambda w: getattr(w, "speaker", None)):
    text = " ".join((w.punctuated_word or w.word) for w in group)
    print(f"Speaker {speaker}: {text}")

Per-word fields available on each entry:

Field	Type	Description
`word`	`str`	Lowercase token
`punctuated_word`	`str \| None`	Token with smart-formatted casing/punctuation (when `smart_format=True`)
`start`, `end`	`float`	Audio timestamps in seconds
`confidence`	`float`	0.0–1.0 confidence
`speaker`	`int \| None`	Speaker id (when `diarize=True`); `None` if diarization disabled
`speaker_confidence`	`float \| None`	Speaker-id confidence

For a higher-level breakdown, set utterances=True to get pre-grouped speaker turns at response.results.utterances. Set paragraphs=True for a paragraphs view organised by speaker turn boundaries.

Quick start — WSS subset (diarize / redact / entities only)

import threading
from deepgram.core.events import EventType

with client.listen.v1.connect(model="nova-3", diarize=True, redact=["pii"]) as conn:
    conn.on(EventType.MESSAGE, lambda m: print(m))
    threading.Thread(target=conn.start_listening, daemon=True).start()
    for chunk in audio_chunks:
        conn.send_media(chunk)
    conn.send_finalize()

Key parameters

summarize, topics, intents, sentiment, detect_language, diarize, redact, custom_topic, custom_topic_mode, custom_intent, custom_intent_mode, detect_entities, plus all the standard STT params (model, language, encoding, sample_rate, ...).

redact is typed as Optional[str] in the current generated SDK (src/deepgram/listen/v1/media/client.py). Pass a single redaction mode such as "pci", "pii", "numbers", or "phi". Multi-mode redaction at the transport level is supported by sending redact as a repeated query parameter — check src/deepgram/types/listen_v1redact.py for the current type and fall back to raw query-param construction (or multiple calls) if you need several modes. The earlier Union[str, Sequence[str]] override is no longer carried in .fernignore.

API reference (layered)

In-repo reference: reference.md — "Listen V1 Media" (REST params include all analytics flags), "Listen V1 Connect" (WSS-supported subset).
OpenAPI (REST): https://developers.deepgram.com/openapi.yaml
AsyncAPI (WSS): https://developers.deepgram.com/asyncapi.yaml
Context7: library ID /llmstxt/developers_deepgram_llms_txt.
Product docs:

Gotchas

summarize on /v1/listen accepts a boolean OR the string "v2". Use "v2" to pin the current summarization model; True also works (maps to the default model). /v1/read is the reverse — it accepts boolean only. If you need summarization on already-transcribed text, see deepgram-python-text-intelligence.
Sentiment / topics / intents / summarize / detect_language are REST-only. Don't pass them on WSS — they'll be ignored or rejected.
English-only for sentiment / topics / intents / summarize.
Not all models support all overlays. Flux / Base models have restrictions. Stick to nova-3 unless you have a reason.
Redaction values are pci, pii, phi, numbers, etc. — not arbitrary strings.
custom_topic / custom_intent need a mode ("extended" or "strict").
Diarization is noisy on short / low-quality audio. Expect speaker churn on <30s clips.

Example files in this repo

examples/15-transcription-advanced-options.py — smart_format, punctuate, diarize
tests/wire/test_listen_v1_media.py — wire test covering intelligence params

Related skills

deepgram-python-speech-to-text — same endpoint, plain transcription
deepgram-python-text-intelligence — same analytics, text input
deepgram-python-conversational-stt — Flux for turn-taking
deepgram-python-voice-agent — interactive assistants

Central product skills

For cross-language Deepgram product knowledge — the consolidated API reference, documentation finder, focused runnable recipes, third-party integration examples, and MCP setup — install the central skills:

npx skills add deepgram/skills

This SDK ships language-idiomatic code skills; deepgram/skills ships cross-language product knowledge (see api, docs, recipes, examples, starters, setup-mcp).

Repository: deepgram/deepgram-python-sdk
Commit: e169f63

Last updated: 9 days ago
Created: 9 days ago

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.