or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

tessl/pypi-sodapy

Python library for the Socrata Open Data API

Workspace
tessl
Visibility
Public
Created
Last updated
Describes
pypipkg:pypi/sodapy@2.2.x

To install, run

npx @tessl/cli install tessl/pypi-sodapy@2.2.0

0

# Sodapy

1

2

A Python client library for the Socrata Open Data API (SODA). Sodapy enables programmatic access to open data hosted on Socrata platforms, providing comprehensive functionality for reading datasets with SoQL query support, paginating through large datasets, managing dataset metadata, and performing dataset creation and data upsert operations.

3

4

## Package Information

5

6

- **Package Name**: sodapy

7

- **Language**: Python

8

- **Installation**: `pip install sodapy`

9

- **Repository**: https://github.com/xmunoz/sodapy

10

11

## Core Imports

12

13

```python

14

from sodapy import Socrata

15

import sodapy # For version access

16

from typing import Generator

17

from io import IOBase

18

```

19

20

Version information:

21

```python

22

print(sodapy.__version__) # "2.2.0"

23

```

24

25

## Basic Usage

26

27

```python

28

from sodapy import Socrata

29

30

# Initialize client with domain and optional app token

31

client = Socrata("opendata.socrata.com", "your_app_token")

32

33

# Basic data retrieval

34

results = client.get("dataset_id")

35

36

# Query with SoQL filtering

37

results = client.get("dataset_id", where="column > 100", limit=500)

38

39

# Get all data with automatic pagination

40

for record in client.get_all("dataset_id"):

41

print(record)

42

43

# Always close the client when done

44

client.close()

45

46

# Or use as context manager

47

with Socrata("opendata.socrata.com", "your_app_token") as client:

48

results = client.get("dataset_id", where="age > 21")

49

```

50

51

## Architecture

52

53

Sodapy is built around a single `Socrata` class that manages HTTP sessions and provides methods for all SODA API operations. The client handles authentication (basic HTTP auth, OAuth 2.0, or app tokens), automatic rate limiting, and provides both synchronous data access and generator-based pagination for large datasets.

54

55

## Capabilities

56

57

### Client Initialization

58

59

Create and configure a Socrata client for API access.

60

61

```python { .api }

62

class Socrata:

63

def __init__(

64

self,

65

domain: str,

66

app_token: str | None,

67

username: str | None = None,

68

password: str | None = None,

69

access_token: str | None = None,

70

session_adapter: dict | None = None,

71

timeout: int | float = 10

72

):

73

"""

74

Initialize Socrata client.

75

76

Args:

77

domain: Socrata domain (e.g., "opendata.socrata.com")

78

app_token: Socrata application token (optional but recommended)

79

username: Username for basic HTTP auth (for write operations)

80

password: Password for basic HTTP auth (for write operations)

81

access_token: OAuth 2.0 access token

82

session_adapter: Custom session adapter configuration

83

timeout: Request timeout in seconds

84

"""

85

```

86

87

### Context Manager Support

88

89

Use Socrata client as a context manager for automatic cleanup.

90

91

```python { .api }

92

def __enter__(self) -> 'Socrata':

93

"""Enter context manager."""

94

95

def __exit__(self, exc_type, exc_value, traceback) -> None:

96

"""Exit context manager and close session."""

97

```

98

99

### Dataset Discovery

100

101

List and search for datasets on a Socrata domain.

102

103

```python { .api }

104

def datasets(

105

self,

106

limit: int = 0,

107

offset: int = 0,

108

order: str = None,

109

**kwargs

110

) -> list:

111

"""

112

Returns list of datasets associated with a domain.

113

114

Args:

115

limit: Maximum number of results (0 = all)

116

offset: Offset for pagination

117

order: Field to sort on, optionally with ' ASC' or ' DESC'

118

ids: List of dataset IDs to filter

119

domains: List of additional domains to search

120

categories: List of category filters

121

tags: List of tag filters

122

only: List of logical types ('dataset', 'chart', etc.)

123

shared_to: User/team IDs or 'site' for public datasets

124

column_names: Required column names in tabular datasets

125

q: Full text search query

126

min_should_match: Elasticsearch match requirement

127

attribution: Organization filter

128

license: License filter

129

derived_from: Parent dataset ID filter

130

provenance: 'official' or 'community'

131

for_user: Owner user ID filter

132

visibility: 'open' or 'internal'

133

public: Boolean for public/private filter

134

published: Boolean for published status filter

135

approval_status: 'pending', 'rejected', 'approved', 'not_ready'

136

explicitly_hidden: Boolean for hidden status filter

137

derived: Boolean for derived dataset filter

138

139

Returns:

140

List of dataset metadata dictionaries

141

"""

142

```

143

144

### Data Reading

145

146

Retrieve data from Socrata datasets with query capabilities.

147

148

```python { .api }

149

def get(

150

self,

151

dataset_identifier: str,

152

content_type: str = "json",

153

**kwargs

154

) -> list | dict | str:

155

"""

156

Read data from dataset with SoQL query support.

157

158

Args:

159

dataset_identifier: Dataset ID or identifier

160

content_type: Response format ('json', 'csv', 'xml')

161

select: Columns to return (defaults to all)

162

where: Row filter conditions

163

order: Sort specification

164

group: Column to group results on

165

limit: Maximum results to return (default 1000)

166

offset: Pagination offset (default 0)

167

q: Full text search value

168

query: Complete SoQL query string

169

exclude_system_fields: Exclude system fields (default True)

170

171

Returns:

172

List/dict of records for JSON, or string for CSV/XML

173

"""

174

175

def get_all(self, *args, **kwargs) -> Generator:

176

"""

177

Generator that retrieves all data with automatic pagination.

178

Accepts same arguments as get().

179

180

Yields:

181

Individual records from the dataset

182

"""

183

```

184

185

### Metadata Operations

186

187

Retrieve and update dataset metadata.

188

189

```python { .api }

190

def get_metadata(

191

self,

192

dataset_identifier: str,

193

content_type: str = "json"

194

) -> dict:

195

"""

196

Retrieve dataset metadata.

197

198

Args:

199

dataset_identifier: Dataset ID

200

content_type: Response format

201

202

Returns:

203

Dataset metadata dictionary

204

"""

205

206

def update_metadata(

207

self,

208

dataset_identifier: str,

209

update_fields: dict,

210

content_type: str = "json"

211

) -> dict:

212

"""

213

Update dataset metadata.

214

215

Args:

216

dataset_identifier: Dataset ID

217

update_fields: Dictionary of fields to update

218

content_type: Response format

219

220

Returns:

221

Updated metadata

222

"""

223

```

224

225

### Data Writing

226

227

Insert, update, or replace data in datasets.

228

229

```python { .api }

230

def upsert(

231

self,

232

dataset_identifier: str,

233

payload: list | dict | IOBase,

234

content_type: str = "json"

235

) -> dict:

236

"""

237

Insert, update, or delete data in existing dataset.

238

239

Args:

240

dataset_identifier: Dataset ID

241

payload: List of records, dictionary, or file object

242

content_type: Data format ('json', 'csv')

243

244

Returns:

245

Operation result with statistics

246

"""

247

248

def replace(

249

self,

250

dataset_identifier: str,

251

payload: list | dict | IOBase,

252

content_type: str = "json"

253

) -> dict:

254

"""

255

Replace all data in dataset with payload.

256

257

Args:

258

dataset_identifier: Dataset ID

259

payload: List of records, dictionary, or file object

260

content_type: Data format ('json', 'csv')

261

262

Returns:

263

Operation result with statistics

264

"""

265

266

def delete(

267

self,

268

dataset_identifier: str,

269

row_id: str | None = None,

270

content_type: str = "json"

271

) -> dict:

272

"""

273

Delete records or entire dataset.

274

275

Args:

276

dataset_identifier: Dataset ID

277

row_id: Specific row ID to delete (None deletes all)

278

content_type: Response format

279

280

Returns:

281

Operation result

282

"""

283

```

284

285

### Dataset Management

286

287

Create and manage datasets.

288

289

```python { .api }

290

def create(self, name: str, **kwargs) -> dict:

291

"""

292

Create new dataset with field types.

293

294

Args:

295

name: Dataset name

296

description: Dataset description

297

columns: List of column definitions

298

category: Dataset category (must exist in domain)

299

tags: List of tag strings

300

row_identifier: Primary key field name

301

new_backend: Use new backend (default False)

302

303

Returns:

304

Created dataset metadata

305

"""

306

307

def publish(

308

self,

309

dataset_identifier: str,

310

content_type: str = "json"

311

) -> dict:

312

"""

313

Publish a dataset.

314

315

Args:

316

dataset_identifier: Dataset ID

317

content_type: Response format

318

319

Returns:

320

Publication result

321

"""

322

323

def set_permission(

324

self,

325

dataset_identifier: str,

326

permission: str = "private",

327

content_type: str = "json"

328

) -> dict:

329

"""

330

Set dataset permissions.

331

332

Args:

333

dataset_identifier: Dataset ID

334

permission: 'private' or 'public'

335

content_type: Response format

336

337

Returns:

338

Permission update result

339

"""

340

```

341

342

### File Attachments

343

344

Manage file attachments on datasets.

345

346

```python { .api }

347

def download_attachments(

348

self,

349

dataset_identifier: str,

350

content_type: str = "json",

351

download_dir: str = "~/sodapy_downloads"

352

) -> list:

353

"""

354

Download all attachments for a dataset.

355

356

Args:

357

dataset_identifier: Dataset ID

358

content_type: Response format

359

download_dir: Local directory for downloads (default: ~/sodapy_downloads)

360

361

Returns:

362

List of downloaded file paths

363

"""

364

365

def create_non_data_file(

366

self,

367

params: dict,

368

files: dict

369

) -> dict:

370

"""

371

Create non-data file attachment.

372

373

Args:

374

params: File parameters and metadata

375

files: Dictionary containing file tuple

376

377

Returns:

378

Created file metadata

379

"""

380

381

def replace_non_data_file(

382

self,

383

dataset_identifier: str,

384

params: dict,

385

files: dict

386

) -> dict:

387

"""

388

Replace existing non-data file attachment.

389

390

Args:

391

dataset_identifier: Dataset ID

392

params: File parameters and metadata

393

files: Dictionary containing file tuple

394

395

Returns:

396

Updated file metadata

397

"""

398

```

399

400

### Connection Management

401

402

Manage HTTP session lifecycle.

403

404

```python { .api }

405

def close(self) -> None:

406

"""Close the HTTP session."""

407

```

408

409

### Class Attributes

410

411

```python { .api }

412

class Socrata:

413

DEFAULT_LIMIT = 1000 # Default pagination limit

414

```

415

416

## Error Handling

417

418

Sodapy raises standard HTTP exceptions for API errors. The library includes enhanced error handling that extracts additional error information from Socrata API responses when available.

419

420

Common exceptions:

421

- `requests.exceptions.HTTPError`: HTTP 4xx/5xx responses with detailed error messages

422

- `TypeError`: Invalid parameter types (e.g. non-numeric timeout)

423

- `Exception`: Missing required parameters (e.g. domain not provided)

424

425

## SoQL Query Language

426

427

Sodapy supports the full Socrata Query Language (SoQL) for filtering and aggregating data:

428

429

- **$select**: Choose columns to return

430

- **$where**: Filter rows with conditions

431

- **$order**: Sort results by columns

432

- **$group**: Group results by columns

433

- **$limit**: Limit number of results

434

- **$offset**: Skip results for pagination

435

- **$q**: Full-text search across all fields

436

437

Example SoQL usage:

438

```python

439

# Filter and sort results

440

results = client.get("dataset_id",

441

where="age > 21 AND city = 'Boston'",

442

select="name, age, city",

443

order="age DESC",

444

limit=100)

445

446

# Aggregation with grouping

447

results = client.get("dataset_id",

448

select="city, COUNT(*) as total",

449

group="city",

450

order="total DESC")

451

```