Tessl Tile for pypi/sodapy@2.2.0

or run

npx @tessl/cli init

Version

Tile

Overview

Evals

Files

docs

index.md

index.mddocs/

0
# Sodapy
1

2
A Python client library for the Socrata Open Data API (SODA). Sodapy enables programmatic access to open data hosted on Socrata platforms, providing comprehensive functionality for reading datasets with SoQL query support, paginating through large datasets, managing dataset metadata, and performing dataset creation and data upsert operations.
3

4
## Package Information
5

6
- **Package Name**: sodapy
7
- **Language**: Python
8
- **Installation**: `pip install sodapy`
9
- **Repository**: https://github.com/xmunoz/sodapy
10

11
## Core Imports
12

13
```python
14
from sodapy import Socrata
15
import sodapy  # For version access
16
from typing import Generator
17
from io import IOBase
18
```
19

20
Version information:
21
```python
22
print(sodapy.__version__)  # "2.2.0"
23
```
24

25
## Basic Usage
26

27
```python
28
from sodapy import Socrata
29

30
# Initialize client with domain and optional app token
31
client = Socrata("opendata.socrata.com", "your_app_token")
32

33
# Basic data retrieval
34
results = client.get("dataset_id")
35

36
# Query with SoQL filtering
37
results = client.get("dataset_id", where="column > 100", limit=500)
38

39
# Get all data with automatic pagination
40
for record in client.get_all("dataset_id"):
41
    print(record)
42

43
# Always close the client when done
44
client.close()
45

46
# Or use as context manager
47
with Socrata("opendata.socrata.com", "your_app_token") as client:
48
    results = client.get("dataset_id", where="age > 21")
49
```
50

51
## Architecture
52

53
Sodapy is built around a single `Socrata` class that manages HTTP sessions and provides methods for all SODA API operations. The client handles authentication (basic HTTP auth, OAuth 2.0, or app tokens), automatic rate limiting, and provides both synchronous data access and generator-based pagination for large datasets.
54

55
## Capabilities
56

57
### Client Initialization
58

59
Create and configure a Socrata client for API access.
60

61
```python { .api }
62
class Socrata:
63
    def __init__(
64
        self,
65
        domain: str,
66
        app_token: str | None,
67
        username: str | None = None,
68
        password: str | None = None,
69
        access_token: str | None = None,
70
        session_adapter: dict | None = None,
71
        timeout: int | float = 10
72
    ):
73
        """
74
        Initialize Socrata client.
75
        
76
        Args:
77
            domain: Socrata domain (e.g., "opendata.socrata.com")
78
            app_token: Socrata application token (optional but recommended)
79
            username: Username for basic HTTP auth (for write operations)
80
            password: Password for basic HTTP auth (for write operations)
81
            access_token: OAuth 2.0 access token
82
            session_adapter: Custom session adapter configuration
83
            timeout: Request timeout in seconds
84
        """
85
```
86

87
### Context Manager Support
88

89
Use Socrata client as a context manager for automatic cleanup.
90

91
```python { .api }
92
def __enter__(self) -> 'Socrata':
93
    """Enter context manager."""
94

95
def __exit__(self, exc_type, exc_value, traceback) -> None:
96
    """Exit context manager and close session."""
97
```
98

99
### Dataset Discovery
100

101
List and search for datasets on a Socrata domain.
102

103
```python { .api }
104
def datasets(
105
    self,
106
    limit: int = 0,
107
    offset: int = 0,
108
    order: str = None,
109
    **kwargs
110
) -> list:
111
    """
112
    Returns list of datasets associated with a domain.
113
    
114
    Args:
115
        limit: Maximum number of results (0 = all)
116
        offset: Offset for pagination
117
        order: Field to sort on, optionally with ' ASC' or ' DESC'
118
        ids: List of dataset IDs to filter
119
        domains: List of additional domains to search
120
        categories: List of category filters
121
        tags: List of tag filters
122
        only: List of logical types ('dataset', 'chart', etc.)
123
        shared_to: User/team IDs or 'site' for public datasets
124
        column_names: Required column names in tabular datasets
125
        q: Full text search query
126
        min_should_match: Elasticsearch match requirement
127
        attribution: Organization filter
128
        license: License filter
129
        derived_from: Parent dataset ID filter
130
        provenance: 'official' or 'community'
131
        for_user: Owner user ID filter
132
        visibility: 'open' or 'internal'
133
        public: Boolean for public/private filter
134
        published: Boolean for published status filter
135
        approval_status: 'pending', 'rejected', 'approved', 'not_ready'
136
        explicitly_hidden: Boolean for hidden status filter
137
        derived: Boolean for derived dataset filter
138
        
139
    Returns:
140
        List of dataset metadata dictionaries
141
    """
142
```
143

144
### Data Reading
145

146
Retrieve data from Socrata datasets with query capabilities.
147

148
```python { .api }
149
def get(
150
    self,
151
    dataset_identifier: str,
152
    content_type: str = "json",
153
    **kwargs
154
) -> list | dict | str:
155
    """
156
    Read data from dataset with SoQL query support.
157
    
158
    Args:
159
        dataset_identifier: Dataset ID or identifier
160
        content_type: Response format ('json', 'csv', 'xml')
161
        select: Columns to return (defaults to all)
162
        where: Row filter conditions
163
        order: Sort specification
164
        group: Column to group results on
165
        limit: Maximum results to return (default 1000)
166
        offset: Pagination offset (default 0)
167
        q: Full text search value
168
        query: Complete SoQL query string
169
        exclude_system_fields: Exclude system fields (default True)
170
        
171
    Returns:
172
        List/dict of records for JSON, or string for CSV/XML
173
    """
174

175
def get_all(self, *args, **kwargs) -> Generator:
176
    """
177
    Generator that retrieves all data with automatic pagination.
178
    Accepts same arguments as get().
179
    
180
    Yields:
181
        Individual records from the dataset
182
    """
183
```
184

185
### Metadata Operations
186

187
Retrieve and update dataset metadata.
188

189
```python { .api }
190
def get_metadata(
191
    self,
192
    dataset_identifier: str,
193
    content_type: str = "json"
194
) -> dict:
195
    """
196
    Retrieve dataset metadata.
197
    
198
    Args:
199
        dataset_identifier: Dataset ID
200
        content_type: Response format
201
        
202
    Returns:
203
        Dataset metadata dictionary
204
    """
205

206
def update_metadata(
207
    self,
208
    dataset_identifier: str,
209
    update_fields: dict,
210
    content_type: str = "json"
211
) -> dict:
212
    """
213
    Update dataset metadata.
214
    
215
    Args:
216
        dataset_identifier: Dataset ID
217
        update_fields: Dictionary of fields to update
218
        content_type: Response format
219
        
220
    Returns:
221
        Updated metadata
222
    """
223
```
224

225
### Data Writing
226

227
Insert, update, or replace data in datasets.
228

229
```python { .api }
230
def upsert(
231
    self,
232
    dataset_identifier: str,
233
    payload: list | dict | IOBase,
234
    content_type: str = "json"
235
) -> dict:
236
    """
237
    Insert, update, or delete data in existing dataset.
238
    
239
    Args:
240
        dataset_identifier: Dataset ID
241
        payload: List of records, dictionary, or file object
242
        content_type: Data format ('json', 'csv')
243
        
244
    Returns:
245
        Operation result with statistics
246
    """
247

248
def replace(
249
    self,
250
    dataset_identifier: str,
251
    payload: list | dict | IOBase,
252
    content_type: str = "json"
253
) -> dict:
254
    """
255
    Replace all data in dataset with payload.
256
    
257
    Args:
258
        dataset_identifier: Dataset ID
259
        payload: List of records, dictionary, or file object
260
        content_type: Data format ('json', 'csv')
261
        
262
    Returns:
263
        Operation result with statistics
264
    """
265

266
def delete(
267
    self,
268
    dataset_identifier: str,
269
    row_id: str | None = None,
270
    content_type: str = "json"
271
) -> dict:
272
    """
273
    Delete records or entire dataset.
274
    
275
    Args:
276
        dataset_identifier: Dataset ID
277
        row_id: Specific row ID to delete (None deletes all)
278
        content_type: Response format
279
        
280
    Returns:
281
        Operation result
282
    """
283
```
284

285
### Dataset Management
286

287
Create and manage datasets.
288

289
```python { .api }
290
def create(self, name: str, **kwargs) -> dict:
291
    """
292
    Create new dataset with field types.
293
    
294
    Args:
295
        name: Dataset name
296
        description: Dataset description
297
        columns: List of column definitions
298
        category: Dataset category (must exist in domain)
299
        tags: List of tag strings
300
        row_identifier: Primary key field name
301
        new_backend: Use new backend (default False)
302
        
303
    Returns:
304
        Created dataset metadata
305
    """
306

307
def publish(
308
    self,
309
    dataset_identifier: str,
310
    content_type: str = "json"
311
) -> dict:
312
    """
313
    Publish a dataset.
314
    
315
    Args:
316
        dataset_identifier: Dataset ID
317
        content_type: Response format
318
        
319
    Returns:
320
        Publication result
321
    """
322

323
def set_permission(
324
    self,
325
    dataset_identifier: str,
326
    permission: str = "private",
327
    content_type: str = "json"
328
) -> dict:
329
    """
330
    Set dataset permissions.
331
    
332
    Args:
333
        dataset_identifier: Dataset ID
334
        permission: 'private' or 'public'
335
        content_type: Response format
336
        
337
    Returns:
338
        Permission update result
339
    """
340
```
341

342
### File Attachments
343

344
Manage file attachments on datasets.
345

346
```python { .api }
347
def download_attachments(
348
    self,
349
    dataset_identifier: str,
350
    content_type: str = "json",
351
    download_dir: str = "~/sodapy_downloads"
352
) -> list:
353
    """
354
    Download all attachments for a dataset.
355
    
356
    Args:
357
        dataset_identifier: Dataset ID
358
        content_type: Response format
359
        download_dir: Local directory for downloads (default: ~/sodapy_downloads)
360
        
361
    Returns:
362
        List of downloaded file paths
363
    """
364

365
def create_non_data_file(
366
    self,
367
    params: dict,
368
    files: dict
369
) -> dict:
370
    """
371
    Create non-data file attachment.
372
    
373
    Args:
374
        params: File parameters and metadata
375
        files: Dictionary containing file tuple
376
        
377
    Returns:
378
        Created file metadata
379
    """
380

381
def replace_non_data_file(
382
    self,
383
    dataset_identifier: str,
384
    params: dict,
385
    files: dict
386
) -> dict:
387
    """
388
    Replace existing non-data file attachment.
389
    
390
    Args:
391
        dataset_identifier: Dataset ID
392
        params: File parameters and metadata
393
        files: Dictionary containing file tuple
394
        
395
    Returns:
396
        Updated file metadata
397
    """
398
```
399

400
### Connection Management
401

402
Manage HTTP session lifecycle.
403

404
```python { .api }
405
def close(self) -> None:
406
    """Close the HTTP session."""
407
```
408

409
### Class Attributes
410

411
```python { .api }
412
class Socrata:
413
    DEFAULT_LIMIT = 1000  # Default pagination limit
414
```
415

416
## Error Handling
417

418
Sodapy raises standard HTTP exceptions for API errors. The library includes enhanced error handling that extracts additional error information from Socrata API responses when available.
419

420
Common exceptions:
421
- `requests.exceptions.HTTPError`: HTTP 4xx/5xx responses with detailed error messages
422
- `TypeError`: Invalid parameter types (e.g. non-numeric timeout)
423
- `Exception`: Missing required parameters (e.g. domain not provided)
424

425
## SoQL Query Language
426

427
Sodapy supports the full Socrata Query Language (SoQL) for filtering and aggregating data:
428

429
- **$select**: Choose columns to return
430
- **$where**: Filter rows with conditions
431
- **$order**: Sort results by columns
432
- **$group**: Group results by columns
433
- **$limit**: Limit number of results
434
- **$offset**: Skip results for pagination
435
- **$q**: Full-text search across all fields
436

437
Example SoQL usage:
438
```python
439
# Filter and sort results
440
results = client.get("dataset_id", 
441
                    where="age > 21 AND city = 'Boston'",
442
                    select="name, age, city",
443
                    order="age DESC",
444
                    limit=100)
445

446
# Aggregation with grouping
447
results = client.get("dataset_id",
448
                    select="city, COUNT(*) as total",
449
                    group="city",
450
                    order="total DESC")
451
```

Version

Tile

Files

index.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}docs/

index.mddocs/