0
# Deepgram Python SDK
1
2
The official Python SDK for the Deepgram automated speech recognition platform, enabling developers to integrate advanced AI-powered speech-to-text, text-to-speech, and audio intelligence capabilities into their applications. The SDK offers comprehensive functionality including real-time streaming transcription via WebSocket connections, batch processing of pre-recorded audio files, text-to-speech synthesis, conversational AI agents, text intelligence analysis, and complete project management through Deepgram's platform APIs.
3
4
## Package Information
5
6
- **Package Name**: deepgram-sdk
7
- **Package Type**: pypi
8
- **Language**: Python
9
- **Installation**: `pip install deepgram-sdk`
10
- **Python Version**: 3.10+
11
12
## Core Imports
13
14
```python
15
from deepgram import DeepgramClient, DeepgramClientOptions
16
```
17
18
Common imports for specific functionality:
19
20
```python
21
# For speech-to-text
22
from deepgram import (
23
ListenRESTClient, ListenWebSocketClient,
24
ListenRESTOptions, ListenWebSocketOptions
25
)
26
27
# For text-to-speech
28
from deepgram import (
29
SpeakRESTClient, SpeakWebSocketClient,
30
SpeakRESTOptions, SpeakWSOptions
31
)
32
33
# For text analysis
34
from deepgram import AnalyzeClient, AnalyzeOptions
35
36
# For project management
37
from deepgram import ManageClient
38
39
# For conversational AI
40
from deepgram import AgentWebSocketClient
41
```
42
43
## Basic Usage
44
45
```python
46
from deepgram import DeepgramClient, DeepgramClientOptions
47
import os
48
49
# Initialize client with API key
50
client = DeepgramClient(api_key="your-api-key")
51
52
# Alternative: Initialize with environment variables
53
# Set DEEPGRAM_API_KEY environment variable
54
client = DeepgramClient()
55
56
# Speech-to-text with prerecorded audio
57
from deepgram import UrlSource, ListenRESTOptions
58
source = UrlSource("https://example.com/audio.wav")
59
options = ListenRESTOptions(model="nova-2", language="en-US")
60
response = client.listen.rest.transcribe_url(source, options)
61
print(response.results.channels[0].alternatives[0].transcript)
62
63
# Text-to-speech
64
from deepgram import TextSource, SpeakRESTOptions
65
source = TextSource("Hello, world!")
66
options = SpeakRESTOptions(model="aura-asteria-en")
67
response = client.speak.rest.stream(source, options)
68
# Save audio to file
69
with open("output.wav", "wb") as f:
70
f.write(response.content)
71
```
72
73
## Architecture
74
75
The Deepgram SDK is organized around a main client (`DeepgramClient`) that provides access to different service routers:
76
77
- **Listen Router**: Speech-to-text capabilities (REST and WebSocket)
78
- **Speak Router**: Text-to-speech capabilities (REST and WebSocket)
79
- **Read Router**: Text analysis and intelligence
80
- **Manage Router**: Account, project, and usage management (sync and async variants)
81
- **Agent Router**: Conversational AI WebSocket connections
82
- **Auth Router**: Authentication token management (sync and async variants)
83
- **Self-hosted Router**: On-premises deployment support (sync and async variants)
84
85
Each router provides both synchronous and asynchronous clients, with REST interfaces for batch processing and WebSocket interfaces for real-time streaming.
86
87
### Router Access Patterns
88
89
```python
90
# Synchronous access
91
client.listen.rest # ListenRESTClient
92
client.listen.websocket # ListenWebSocketClient
93
client.speak.rest # SpeakRESTClient
94
client.speak.websocket # SpeakWebSocketClient
95
client.read # ReadClient/AnalyzeClient
96
client.manage # ManageClient
97
client.auth.v("1") # AuthRESTClient
98
client.selfhosted # SelfHostedClient
99
client.agent # AgentWebSocketClient
100
101
# Asynchronous access
102
client.listen.asyncrest # AsyncListenRESTClient
103
client.listen.asyncwebsocket # AsyncListenWebSocketClient
104
client.speak.asyncrest # AsyncSpeakRESTClient
105
client.speak.asyncwebsocket # AsyncSpeakWebSocketClient
106
client.read # AsyncReadClient/AsyncAnalyzeClient
107
client.asyncmanage # AsyncManageClient
108
client.asyncauth.v("1") # AsyncAuthRESTClient
109
client.asyncselfhosted # AsyncSelfHostedClient
110
client.agent # AsyncAgentWebSocketClient
111
```
112
113
## Capabilities
114
115
### Speech-to-Text (Listen)
116
117
Comprehensive speech recognition capabilities supporting both batch transcription of prerecorded audio and real-time streaming transcription. Includes advanced features like speaker diarization, punctuation, profanity filtering, keyword detection, and multiple language support.
118
119
```python { .api }
120
# REST Client
121
class ListenRESTClient:
122
def transcribe_url(self, source, options): ...
123
def transcribe_file(self, source, options): ...
124
125
# WebSocket Client
126
class ListenWebSocketClient:
127
def start(self, options): ...
128
def send(self, data): ...
129
def finish(self): ...
130
def close(self): ...
131
132
# Options
133
class ListenRESTOptions:
134
model: str
135
language: str
136
punctuate: bool
137
diarize: bool
138
# ... additional options
139
140
class ListenWebSocketOptions:
141
model: str
142
language: str
143
encoding: str
144
sample_rate: int
145
# ... additional options
146
```
147
148
[Speech-to-Text](./speech-to-text.md)
149
150
### Text-to-Speech (Speak)
151
152
High-quality neural text-to-speech synthesis with multiple voice models and real-time streaming capabilities. Supports both REST API for generating complete audio files and WebSocket streaming for real-time audio generation.
153
154
```python { .api }
155
# REST Client
156
class SpeakRESTClient:
157
def stream(self, source, options): ...
158
def save(self, filename, source, options): ...
159
160
# WebSocket Client
161
class SpeakWebSocketClient:
162
def start(self, options): ...
163
def send(self, message): ...
164
def close(self): ...
165
166
# Options
167
class SpeakRESTOptions:
168
model: str
169
encoding: str
170
container: str
171
sample_rate: int
172
bit_rate: int
173
174
class SpeakWSOptions:
175
model: str
176
encoding: str
177
sample_rate: int
178
```
179
180
[Text-to-Speech](./text-to-speech.md)
181
182
### Text Analysis (Read)
183
184
Advanced text intelligence capabilities including sentiment analysis, topic detection, intent recognition, and content summarization. Processes text content to extract insights and understanding.
185
186
```python { .api }
187
class AnalyzeClient:
188
def analyze_url(self, source, options): ...
189
def analyze_text(self, source, options): ...
190
191
class AnalyzeOptions:
192
language: str
193
topics: bool
194
intents: bool
195
sentiment: bool
196
summarize: bool
197
```
198
199
[Text Analysis](./text-analysis.md)
200
201
### Project Management (Manage)
202
203
Complete account and project management functionality including API key management, usage tracking, team member management, and billing information access.
204
205
```python { .api }
206
class ManageClient:
207
def get_projects(self): ...
208
def get_project(self, project_id): ...
209
def get_keys(self, project_id): ...
210
def create_key(self, project_id, options): ...
211
def get_usage(self, project_id, options): ...
212
def get_balances(self, project_id): ...
213
# ... additional management methods
214
```
215
216
[Project Management](./project-management.md)
217
218
### Conversational AI (Agent)
219
220
Real-time conversational AI capabilities enabling voice-based interactions with intelligent agents. Supports function calling, dynamic prompt updates, and bidirectional audio streaming.
221
222
```python { .api }
223
class AgentWebSocketClient:
224
def start(self, options): ...
225
def send_settings(self, settings): ...
226
def update_prompt(self, prompt): ...
227
def inject_message(self, message): ...
228
def close(self): ...
229
230
class SettingsOptions:
231
agent: dict
232
listen: dict
233
speak: dict
234
think: dict
235
```
236
237
[Conversational AI](./conversational-ai.md)
238
239
### Audio Utilities
240
241
Utility classes for audio input/output operations including microphone capture and speaker playback, with configurable audio parameters and error handling.
242
243
```python { .api }
244
class Microphone:
245
def __init__(self, **kwargs): ...
246
def start(self): ...
247
def finish(self): ...
248
249
class Speaker:
250
def __init__(self, **kwargs): ...
251
def start(self): ...
252
def finish(self): ...
253
254
# Constants
255
INPUT_CHANNELS: int = 1
256
INPUT_RATE: int = 16000
257
INPUT_CHUNK: int = 8192
258
OUTPUT_CHANNELS: int = 1
259
OUTPUT_RATE: int = 24000
260
OUTPUT_CHUNK: int = 8192
261
```
262
263
[Audio Utilities](./audio-utilities.md)
264
265
### Authentication (Auth)
266
267
Token management and authentication capabilities for generating temporary JWT tokens from API keys, enabling secure access with configurable time-to-live settings.
268
269
```python { .api }
270
class AuthRESTClient:
271
def grant_token(self, ttl_seconds: int = None) -> GrantTokenResponse: ...
272
273
class AsyncAuthRESTClient:
274
async def grant_token(self, ttl_seconds: int = None) -> GrantTokenResponse: ...
275
276
class GrantTokenResponse:
277
access_token: str
278
expires_in: int
279
```
280
281
### Self-Hosted (OnPrem)
282
283
Support for on-premises and self-hosted Deepgram deployments with custom endpoint configuration and deployment management.
284
285
```python { .api }
286
class SelfHostedClient:
287
def __init__(self, config: DeepgramClientOptions): ...
288
289
class AsyncSelfHostedClient:
290
def __init__(self, config: DeepgramClientOptions): ...
291
292
# Backward compatibility aliases
293
class OnPremClient(SelfHostedClient): ...
294
class AsyncOnPremClient(AsyncSelfHostedClient): ...
295
```
296
297
## Types
298
299
```python { .api }
300
class DeepgramClient:
301
def __init__(self, api_key: str = "", config: DeepgramClientOptions = None, access_token: str = ""): ...
302
@property
303
def listen(self): ...
304
@property
305
def speak(self): ...
306
@property
307
def read(self): ...
308
@property
309
def manage(self): ...
310
@property
311
def asyncmanage(self): ...
312
@property
313
def agent(self): ...
314
@property
315
def auth(self): ...
316
@property
317
def asyncauth(self): ...
318
@property
319
def selfhosted(self): ...
320
@property
321
def asyncselfhosted(self): ...
322
323
class DeepgramClientOptions:
324
api_key: str
325
access_token: str
326
url: str
327
verbose: int
328
headers: dict
329
options: dict
330
331
# Source types for different input methods
332
class TextSource:
333
def __init__(self, text: str): ...
334
335
class BufferSource:
336
def __init__(self, buffer: bytes): ...
337
338
class FileSource:
339
def __init__(self, file: str): ...
340
341
class UrlSource:
342
def __init__(self, url: str): ...
343
344
class StreamSource:
345
def __init__(self, stream): ...
346
347
# Base response class
348
class BaseResponse:
349
def __init__(self, **kwargs): ...
350
```
351
352
## Error Handling
353
354
```python { .api }
355
class DeepgramError(Exception):
356
"""Base exception for Deepgram SDK errors"""
357
358
class DeepgramApiError(DeepgramError):
359
"""API response errors"""
360
361
class DeepgramApiKeyError(DeepgramError):
362
"""Missing or invalid API key"""
363
364
class DeepgramTypeError(DeepgramError):
365
"""Type validation errors"""
366
367
class DeepgramMicrophoneError(Exception):
368
"""Microphone operation errors"""
369
370
class DeepgramSpeakerError(Exception):
371
"""Speaker operation errors"""
372
```