deepgram-python-conversational-stt

Use when writing or reviewing Python code in this repo that calls Deepgram Conversational STT v2 / Flux (`/v2/listen`) for turn-aware streaming transcription. Covers `client.listen.v2.connect(...)`, Flux models, end-of-turn detection. Use `deepgram-python-speech-to-text` for standard v1 ASR, `deepgram-python-voice-agent` for full-duplex interactive assistants. Triggers include "flux", "v2 listen", "conversational STT", "turn detection", "end of turn", "EOT", "listen.v2", "flux-general-en", "flux-general-multi".

1.19x

Quality

82%

Does it follow best practices?

Impact

100%

1.19x

Average score across 3 eval scenarios

Securityby

Advisory

Suggest reviewing before use

Using Deepgram Conversational STT / Flux (Python SDK)

Name: deepgram-python-conversational-stt
Rating: 88.8 (1 reviews)
Author: deepgram

Turn-aware streaming STT at /v2/listen — optimized for conversational audio (end-of-turn detection, eager EOT, barge-in scenarios).

When to use this product

You're building a conversational UI and need explicit turn boundaries.
You want Flux models (optimized for human-to-human or human-to-agent conversation).
You want lower latency turn signals than v1 utterance_end.

Use a different skill when:

You want general-purpose transcription (captions, batch, non-conversational) → deepgram-python-speech-to-text.
You want a full interactive agent (STT + LLM + TTS) → deepgram-python-voice-agent.
You want analytics (summarize/sentiment) → deepgram-python-audio-intelligence.

Authentication

import os
from dotenv import load_dotenv
load_dotenv()

from deepgram import DeepgramClient
client = DeepgramClient(api_key=os.environ["DEEPGRAM_API_KEY"])

Header: Authorization: Token <api_key>. WSS only — no REST path on v2.

Quick start

import threading, time
from pathlib import Path
from deepgram.core.events import EventType
from deepgram.listen.v2.types import (
    ListenV2CloseStream,
    ListenV2Connected,
    ListenV2FatalError,
    ListenV2TurnInfo,
)

with client.listen.v2.connect(
    model="flux-general-en",
    encoding="linear16",
    sample_rate="16000",
) as conn:

    def on_message(m):
        if isinstance(m, ListenV2TurnInfo):
            print(f"turn {m.turn_index} [{m.event}] {m.transcript}")
        elif isinstance(m, dict):                     # untyped fallback
            if m.get("type") == "TurnInfo":
                print(f"turn {m.get('turn_index')} [{m.get('event')}] {m.get('transcript')}")
        else:
            print(f"event: {getattr(m, 'type', type(m).__name__)}")

    conn.on(EventType.OPEN,    lambda _: print("open"))
    conn.on(EventType.MESSAGE, on_message)
    conn.on(EventType.CLOSE,   lambda _: print("close"))
    conn.on(EventType.ERROR,   lambda e: print(f"err: {type(e).__name__}: {e}"))

    def send_audio():
        for chunk in mic_chunks():                     # 80ms recommended
            conn.send_media(chunk)
            time.sleep(0.01)
        conn.send_close_stream(ListenV2CloseStream(type="CloseStream"))

    threading.Thread(target=send_audio, daemon=True).start()
    conn.start_listening()

Key parameters

Param	Notes
`model`	`flux-general-en` (English) or `flux-general-multi` (multilingual) — REQUIRED, must be a Flux model
`encoding`	`linear16`, `mulaw`, etc. Omit for containerized audio
`sample_rate`	String in the SDK signature, e.g. `"16000"`
`eager_eot_threshold`	Fire end-of-turn early at this confidence
`eot_threshold`	Primary end-of-turn confidence
`eot_timeout_ms`	Time-based fallback turn end
`keyterm`	Bias for domain keywords
`mip_opt_out`, `tag`	Metadata / privacy flags
`language_hint`	ONLY for `flux-general-multi`
`authorization`, `request_options`	Override auth or request options

No language parameter on v2 — language is implied by model (flux-general-en) or hinted via language_hint on multi.

Events (server → client)

ListenV2Connected — connection established
ListenV2ConfigureSuccess / ListenV2ConfigureFailure — mid-session config changes
ListenV2TurnInfo — per-turn transcript + event (Update, EndOfTurn, EagerEndOfTurn, ...) + turn_index
ListenV2FatalError — terminal error

Client messages: ListenV2Media, ListenV2Configure, ListenV2CloseStream.

Async equivalent

from deepgram import AsyncDeepgramClient
client = AsyncDeepgramClient()

async with client.listen.v2.connect(model="flux-general-en", ...) as conn:
    # same .on(...) handlers, then:
    await conn.start_listening()

API reference (layered)

In-repo reference: reference.md — "Listen V2 Connect".
AsyncAPI (WSS): https://developers.deepgram.com/asyncapi.yaml
Context7: library ID /llmstxt/developers_deepgram_llms_txt.
Product docs:

Gotchas

/v2/listen, not /v1/listen. Different route, different client path (listen.v2 vs listen.v1).
Flux models only. nova-3, base, etc. will be rejected. Use flux-general-en or flux-general-multi.
No language parameter. Language is set by model choice. Use language_hint on flux-general-multi.
sample_rate is a STRING in the SDK (e.g. "16000").
Send ~80ms audio chunks for best turn-detection latency.
Close with send_close_stream(ListenV2CloseStream(type="CloseStream")) — not send_finalize (that's v1).
Messages may arrive as typed objects OR raw dicts — the SDK uses a tagged union with construct_type for unknowns. Handle both branches (see socket_client.py patch in .fernignore).
socket_client.py is patched / frozen (see .fernignore → src/deepgram/listen/v2/socket_client.py). Don't overwrite that manual patch during regeneration; treat other listen/v2 files as generated unless the regen workflow says otherwise.
Omit encoding/sample_rate for containerized audio (WAV, OGG, etc.) — the server detects them from the container.

Example files in this repo

examples/14-transcription-live-websocket-v2.py
tests/manual/listen/v2/connect/main.py

Related skills

deepgram-python-speech-to-text — v1 general-purpose STT (REST + WSS)
deepgram-python-voice-agent — full interactive assistant

Central product skills

For cross-language Deepgram product knowledge — the consolidated API reference, documentation finder, focused runnable recipes, third-party integration examples, and MCP setup — install the central skills:

npx skills add deepgram/skills

This SDK ships language-idiomatic code skills; deepgram/skills ships cross-language product knowledge (see api, docs, recipes, examples, starters, setup-mcp).

Repository: deepgram/deepgram-python-sdk
Commit: e169f63

Last updated: 9 days ago
Created: 9 days ago

Is this your skill?

If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.