Use when writing or reviewing Python code in this repo that calls Deepgram Text-to-Speech v1 (`/v1/speak`) for audio synthesis. Covers one-shot REST (`client.speak.v1.audio.generate`) and streaming WebSocket (`client.speak.v1.connect`). Also covers the in-repo `deepgram.helpers.TextBuilder` for incremental text assembly before synthesis. Use `deepgram-python-voice-agent` when you need full-duplex STT + LLM + TTS with barge-in. Triggers include "TTS", "speak", "synthesize voice", "aura", "text to speech", "speak.v1", "TextBuilder".
95
93%
Does it follow best practices?
Impact
Pending
No eval scenarios have been run
Passed
No known issues
Convert text to audio: one-shot REST download or low-latency streaming synthesis via /v1/speak.
speak.v1.audio.generate) — one-shot synthesis, returns audio bytes. Use for rendered files, pre-generated prompts, anything where you have the full text upfront.speak.v1.connect) — incremental text input, streaming audio output. Use for low-latency playback while an LLM is still producing tokens.Use a different skill when:
deepgram-python-voice-agent.from dotenv import load_dotenv
load_dotenv()
from deepgram import DeepgramClient
client = DeepgramClient() # reads DEEPGRAM_API_KEYHeader: Authorization: Token <api_key> (NOT Bearer).
audio_iter = client.speak.v1.audio.generate(
text="Hello, this is a text to speech example.",
model="aura-2-asteria-en",
encoding="linear16",
sample_rate=24000,
)
with open("output.raw", "wb") as f:
for chunk in audio_iter:
f.write(chunk)Returns an iterator of bytes (streaming audio response). The response body is audio/*, NOT JSON. Useful response headers: dg-model-name, dg-char-count, dg-request-id.
from deepgram.core.events import EventType
from deepgram.speak.v1.types import SpeakV1Text
with client.speak.v1.connect(
model="aura-2-asteria-en",
encoding="linear16",
sample_rate=24000,
) as conn:
def on_message(m):
if isinstance(m, bytes):
# audio chunk — write to file or audio output
...
else:
print(f"event: {getattr(m, 'type', 'Unknown')}")
conn.on(EventType.OPEN, lambda _: print("open"))
conn.on(EventType.MESSAGE, on_message)
conn.on(EventType.CLOSE, lambda _: print("close"))
conn.on(EventType.ERROR, lambda e: print(f"err: {e}"))
conn.send_text(SpeakV1Text(text="Hello, this is streaming TTS."))
conn.send_flush()
conn.send_close()
conn.start_listening() # blocks until server closesIn sync mode, start_listening() blocks — send all text + flush + close BEFORE calling it, OR run it in a thread. In async mode, run start_listening() as a task and send concurrently.
deepgram.helpers.TextBuilder is a hand-maintained helper (NOT Fern-generated) that assembles text incrementally — useful when streaming LLM tokens into TTS.
from deepgram.helpers import TextBuilder
final_text = (
TextBuilder()
.text("Hello,")
.text(" this is built incrementally.")
.pronunciation("Deepgram", "ˈdiːpɡɹæm")
.pause(200)
.build()
)The fluent API is .text(...) (append raw text), .pronunciation(word, ipa) (pin pronunciation), .pause(duration_ms) (insert a pause), and .build() (return the final SSML-ish string). There is no .add(...) method.
See examples/22-text-builder-demo.py, examples/23-text-builder-helper.py, examples/24-text-builder-streaming.py.
from deepgram import AsyncDeepgramClient
client = AsyncDeepgramClient()
# REST
audio_iter = await client.speak.v1.audio.generate(text=..., model="aura-2-asteria-en")
async for chunk in audio_iter:
...
# WSS
async with client.speak.v1.connect(model="aura-2-asteria-en", ...) as conn:
listen_task = asyncio.create_task(conn.start_listening())
await conn.send_text(SpeakV1Text(text="..."))
await conn.send_flush()
await conn.send_close()
await listen_taskREST & WSS: model (e.g. aura-2-asteria-en), encoding (linear16, mulaw, alaw, opus, flac, mp3, aac), sample_rate, bit_rate, container, callback (REST async), tag, mip_opt_out.
WSS client messages: SpeakV1Text, Flush, Clear, Close.
reference.md — sections "Speak V1 Audio" (REST) and "Speak V1 Connect" (WSS)./llmstxt/developers_deepgram_llms_txt.Token auth, not Bearer..json() it.send_close() without send_flush() may drop trailing audio.start_listening() blocks. Queue all messages first, or use async.SpeakV1Text is required for WSS text input — don't send raw strings.encoding/sample_rate/container must match your playback path. Mismatches cause silent failure or distortion.TextBuilder helpers are hand-maintained (listed in .fernignore as permanently frozen). Don't move them under src/deepgram/ auto-generated paths.examples/20-text-to-speech-single.py — REST one-shotexamples/21-text-to-speech-streaming.py — WSS streamingexamples/22-text-builder-demo.py — TextBuilder (no API key)examples/23-text-builder-helper.py — TextBuilder + RESTexamples/24-text-builder-streaming.py — TextBuilder + WSStests/wire/test_speak_v1_audio.py — REST wire testtests/manual/speak/v1/connect/main.py — live WSS testFor cross-language Deepgram product knowledge — the consolidated API reference, documentation finder, focused runnable recipes, third-party integration examples, and MCP setup — install the central skills:
npx skills add deepgram/skillsThis SDK ships language-idiomatic code skills; deepgram/skills ships cross-language product knowledge (see api, docs, recipes, examples, starters, setup-mcp).
16b9839
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.