0
# Sodapy
1
2
A Python client library for the Socrata Open Data API (SODA). Sodapy enables programmatic access to open data hosted on Socrata platforms, providing comprehensive functionality for reading datasets with SoQL query support, paginating through large datasets, managing dataset metadata, and performing dataset creation and data upsert operations.
3
4
## Package Information
5
6
- **Package Name**: sodapy
7
- **Language**: Python
8
- **Installation**: `pip install sodapy`
9
- **Repository**: https://github.com/xmunoz/sodapy
10
11
## Core Imports
12
13
```python
14
from sodapy import Socrata
15
import sodapy # For version access
16
from typing import Generator
17
from io import IOBase
18
```
19
20
Version information:
21
```python
22
print(sodapy.__version__) # "2.2.0"
23
```
24
25
## Basic Usage
26
27
```python
28
from sodapy import Socrata
29
30
# Initialize client with domain and optional app token
31
client = Socrata("opendata.socrata.com", "your_app_token")
32
33
# Basic data retrieval
34
results = client.get("dataset_id")
35
36
# Query with SoQL filtering
37
results = client.get("dataset_id", where="column > 100", limit=500)
38
39
# Get all data with automatic pagination
40
for record in client.get_all("dataset_id"):
41
print(record)
42
43
# Always close the client when done
44
client.close()
45
46
# Or use as context manager
47
with Socrata("opendata.socrata.com", "your_app_token") as client:
48
results = client.get("dataset_id", where="age > 21")
49
```
50
51
## Architecture
52
53
Sodapy is built around a single `Socrata` class that manages HTTP sessions and provides methods for all SODA API operations. The client handles authentication (basic HTTP auth, OAuth 2.0, or app tokens), automatic rate limiting, and provides both synchronous data access and generator-based pagination for large datasets.
54
55
## Capabilities
56
57
### Client Initialization
58
59
Create and configure a Socrata client for API access.
60
61
```python { .api }
62
class Socrata:
63
def __init__(
64
self,
65
domain: str,
66
app_token: str | None,
67
username: str | None = None,
68
password: str | None = None,
69
access_token: str | None = None,
70
session_adapter: dict | None = None,
71
timeout: int | float = 10
72
):
73
"""
74
Initialize Socrata client.
75
76
Args:
77
domain: Socrata domain (e.g., "opendata.socrata.com")
78
app_token: Socrata application token (optional but recommended)
79
username: Username for basic HTTP auth (for write operations)
80
password: Password for basic HTTP auth (for write operations)
81
access_token: OAuth 2.0 access token
82
session_adapter: Custom session adapter configuration
83
timeout: Request timeout in seconds
84
"""
85
```
86
87
### Context Manager Support
88
89
Use Socrata client as a context manager for automatic cleanup.
90
91
```python { .api }
92
def __enter__(self) -> 'Socrata':
93
"""Enter context manager."""
94
95
def __exit__(self, exc_type, exc_value, traceback) -> None:
96
"""Exit context manager and close session."""
97
```
98
99
### Dataset Discovery
100
101
List and search for datasets on a Socrata domain.
102
103
```python { .api }
104
def datasets(
105
self,
106
limit: int = 0,
107
offset: int = 0,
108
order: str = None,
109
**kwargs
110
) -> list:
111
"""
112
Returns list of datasets associated with a domain.
113
114
Args:
115
limit: Maximum number of results (0 = all)
116
offset: Offset for pagination
117
order: Field to sort on, optionally with ' ASC' or ' DESC'
118
ids: List of dataset IDs to filter
119
domains: List of additional domains to search
120
categories: List of category filters
121
tags: List of tag filters
122
only: List of logical types ('dataset', 'chart', etc.)
123
shared_to: User/team IDs or 'site' for public datasets
124
column_names: Required column names in tabular datasets
125
q: Full text search query
126
min_should_match: Elasticsearch match requirement
127
attribution: Organization filter
128
license: License filter
129
derived_from: Parent dataset ID filter
130
provenance: 'official' or 'community'
131
for_user: Owner user ID filter
132
visibility: 'open' or 'internal'
133
public: Boolean for public/private filter
134
published: Boolean for published status filter
135
approval_status: 'pending', 'rejected', 'approved', 'not_ready'
136
explicitly_hidden: Boolean for hidden status filter
137
derived: Boolean for derived dataset filter
138
139
Returns:
140
List of dataset metadata dictionaries
141
"""
142
```
143
144
### Data Reading
145
146
Retrieve data from Socrata datasets with query capabilities.
147
148
```python { .api }
149
def get(
150
self,
151
dataset_identifier: str,
152
content_type: str = "json",
153
**kwargs
154
) -> list | dict | str:
155
"""
156
Read data from dataset with SoQL query support.
157
158
Args:
159
dataset_identifier: Dataset ID or identifier
160
content_type: Response format ('json', 'csv', 'xml')
161
select: Columns to return (defaults to all)
162
where: Row filter conditions
163
order: Sort specification
164
group: Column to group results on
165
limit: Maximum results to return (default 1000)
166
offset: Pagination offset (default 0)
167
q: Full text search value
168
query: Complete SoQL query string
169
exclude_system_fields: Exclude system fields (default True)
170
171
Returns:
172
List/dict of records for JSON, or string for CSV/XML
173
"""
174
175
def get_all(self, *args, **kwargs) -> Generator:
176
"""
177
Generator that retrieves all data with automatic pagination.
178
Accepts same arguments as get().
179
180
Yields:
181
Individual records from the dataset
182
"""
183
```
184
185
### Metadata Operations
186
187
Retrieve and update dataset metadata.
188
189
```python { .api }
190
def get_metadata(
191
self,
192
dataset_identifier: str,
193
content_type: str = "json"
194
) -> dict:
195
"""
196
Retrieve dataset metadata.
197
198
Args:
199
dataset_identifier: Dataset ID
200
content_type: Response format
201
202
Returns:
203
Dataset metadata dictionary
204
"""
205
206
def update_metadata(
207
self,
208
dataset_identifier: str,
209
update_fields: dict,
210
content_type: str = "json"
211
) -> dict:
212
"""
213
Update dataset metadata.
214
215
Args:
216
dataset_identifier: Dataset ID
217
update_fields: Dictionary of fields to update
218
content_type: Response format
219
220
Returns:
221
Updated metadata
222
"""
223
```
224
225
### Data Writing
226
227
Insert, update, or replace data in datasets.
228
229
```python { .api }
230
def upsert(
231
self,
232
dataset_identifier: str,
233
payload: list | dict | IOBase,
234
content_type: str = "json"
235
) -> dict:
236
"""
237
Insert, update, or delete data in existing dataset.
238
239
Args:
240
dataset_identifier: Dataset ID
241
payload: List of records, dictionary, or file object
242
content_type: Data format ('json', 'csv')
243
244
Returns:
245
Operation result with statistics
246
"""
247
248
def replace(
249
self,
250
dataset_identifier: str,
251
payload: list | dict | IOBase,
252
content_type: str = "json"
253
) -> dict:
254
"""
255
Replace all data in dataset with payload.
256
257
Args:
258
dataset_identifier: Dataset ID
259
payload: List of records, dictionary, or file object
260
content_type: Data format ('json', 'csv')
261
262
Returns:
263
Operation result with statistics
264
"""
265
266
def delete(
267
self,
268
dataset_identifier: str,
269
row_id: str | None = None,
270
content_type: str = "json"
271
) -> dict:
272
"""
273
Delete records or entire dataset.
274
275
Args:
276
dataset_identifier: Dataset ID
277
row_id: Specific row ID to delete (None deletes all)
278
content_type: Response format
279
280
Returns:
281
Operation result
282
"""
283
```
284
285
### Dataset Management
286
287
Create and manage datasets.
288
289
```python { .api }
290
def create(self, name: str, **kwargs) -> dict:
291
"""
292
Create new dataset with field types.
293
294
Args:
295
name: Dataset name
296
description: Dataset description
297
columns: List of column definitions
298
category: Dataset category (must exist in domain)
299
tags: List of tag strings
300
row_identifier: Primary key field name
301
new_backend: Use new backend (default False)
302
303
Returns:
304
Created dataset metadata
305
"""
306
307
def publish(
308
self,
309
dataset_identifier: str,
310
content_type: str = "json"
311
) -> dict:
312
"""
313
Publish a dataset.
314
315
Args:
316
dataset_identifier: Dataset ID
317
content_type: Response format
318
319
Returns:
320
Publication result
321
"""
322
323
def set_permission(
324
self,
325
dataset_identifier: str,
326
permission: str = "private",
327
content_type: str = "json"
328
) -> dict:
329
"""
330
Set dataset permissions.
331
332
Args:
333
dataset_identifier: Dataset ID
334
permission: 'private' or 'public'
335
content_type: Response format
336
337
Returns:
338
Permission update result
339
"""
340
```
341
342
### File Attachments
343
344
Manage file attachments on datasets.
345
346
```python { .api }
347
def download_attachments(
348
self,
349
dataset_identifier: str,
350
content_type: str = "json",
351
download_dir: str = "~/sodapy_downloads"
352
) -> list:
353
"""
354
Download all attachments for a dataset.
355
356
Args:
357
dataset_identifier: Dataset ID
358
content_type: Response format
359
download_dir: Local directory for downloads (default: ~/sodapy_downloads)
360
361
Returns:
362
List of downloaded file paths
363
"""
364
365
def create_non_data_file(
366
self,
367
params: dict,
368
files: dict
369
) -> dict:
370
"""
371
Create non-data file attachment.
372
373
Args:
374
params: File parameters and metadata
375
files: Dictionary containing file tuple
376
377
Returns:
378
Created file metadata
379
"""
380
381
def replace_non_data_file(
382
self,
383
dataset_identifier: str,
384
params: dict,
385
files: dict
386
) -> dict:
387
"""
388
Replace existing non-data file attachment.
389
390
Args:
391
dataset_identifier: Dataset ID
392
params: File parameters and metadata
393
files: Dictionary containing file tuple
394
395
Returns:
396
Updated file metadata
397
"""
398
```
399
400
### Connection Management
401
402
Manage HTTP session lifecycle.
403
404
```python { .api }
405
def close(self) -> None:
406
"""Close the HTTP session."""
407
```
408
409
### Class Attributes
410
411
```python { .api }
412
class Socrata:
413
DEFAULT_LIMIT = 1000 # Default pagination limit
414
```
415
416
## Error Handling
417
418
Sodapy raises standard HTTP exceptions for API errors. The library includes enhanced error handling that extracts additional error information from Socrata API responses when available.
419
420
Common exceptions:
421
- `requests.exceptions.HTTPError`: HTTP 4xx/5xx responses with detailed error messages
422
- `TypeError`: Invalid parameter types (e.g. non-numeric timeout)
423
- `Exception`: Missing required parameters (e.g. domain not provided)
424
425
## SoQL Query Language
426
427
Sodapy supports the full Socrata Query Language (SoQL) for filtering and aggregating data:
428
429
- **$select**: Choose columns to return
430
- **$where**: Filter rows with conditions
431
- **$order**: Sort results by columns
432
- **$group**: Group results by columns
433
- **$limit**: Limit number of results
434
- **$offset**: Skip results for pagination
435
- **$q**: Full-text search across all fields
436
437
Example SoQL usage:
438
```python
439
# Filter and sort results
440
results = client.get("dataset_id",
441
where="age > 21 AND city = 'Boston'",
442
select="name, age, city",
443
order="age DESC",
444
limit=100)
445
446
# Aggregation with grouping
447
results = client.get("dataset_id",
448
select="city, COUNT(*) as total",
449
group="city",
450
order="total DESC")
451
```