0
# Dataset
1
2
Dataset is a Python toolkit that makes reading and writing data in databases as simple as working with JSON files. It provides automatic schema creation, upsert functionality, and simple query helpers with support for SQLite, PostgreSQL, and MySQL databases. Built on SQLAlchemy, dataset abstracts away complex SQL operations while providing powerful data management capabilities.
3
4
## Package Information
5
6
- **Package Name**: dataset
7
- **Language**: Python
8
- **Installation**: `pip install dataset`
9
10
## Core Imports
11
12
```python
13
import dataset
14
```
15
16
Common patterns:
17
18
```python
19
from dataset import connect, Database, Table
20
```
21
22
For chunked operations:
23
24
```python
25
from dataset import chunked
26
# Then use: chunked.ChunkedInsert, chunked.ChunkedUpdate
27
```
28
29
## Basic Usage
30
31
```python
32
import dataset
33
34
# Connect to database (defaults to in-memory SQLite)
35
db = dataset.connect('sqlite:///mydatabase.db')
36
37
# Get a table (creates if doesn't exist)
38
table = db['people']
39
40
# Insert data (automatically creates columns)
41
table.insert({'name': 'John Doe', 'age': 30, 'city': 'New York'})
42
table.insert({'name': 'Jane Smith', 'age': 25, 'city': 'Boston'})
43
44
# Query data
45
for row in table.find(city='New York'):
46
print(row['name'], row['age'])
47
48
# Find single record
49
person = table.find_one(name='John Doe')
50
print(person)
51
52
# Update records
53
table.update({'name': 'John Doe', 'age': 31}, ['name'])
54
55
# Use transactions
56
with db:
57
table.insert({'name': 'Bob Johnson', 'age': 28})
58
table.upsert({'name': 'John Doe', 'age': 32}, ['name'])
59
60
# Close connection
61
db.close()
62
```
63
64
## Architecture
65
66
Dataset follows a layered architecture:
67
68
- **Database**: Top-level connection manager handling multiple tables, transactions, and database-specific operations
69
- **Table**: Individual table manager providing CRUD operations, schema management, and query building
70
- **Types System**: Automatic type inference and SQLAlchemy type mapping for schema creation
71
- **Connection Pooling**: Thread-safe connection management with SQLAlchemy engine integration
72
73
This design enables automatic schema evolution, simplified data operations, and seamless database portability while maintaining full SQLAlchemy power for complex queries.
74
75
## Capabilities
76
77
### Database Connections
78
79
Core database connection management, table access, transaction control, and raw SQL query execution. Provides the foundation for all dataset operations.
80
81
```python { .api }
82
def connect(url=None, schema=None, engine_kwargs=None, ensure_schema=True,
83
row_type=OrderedDict, sqlite_wal_mode=True, on_connect_statements=None):
84
"""Opens a new connection to a database."""
85
86
class Database:
87
def __init__(self, url, **kwargs): ...
88
def get_table(self, table_name, **kwargs): ...
89
def query(self, query, *args, **kwargs): ...
90
def begin(self): ...
91
def commit(self): ...
92
def rollback(self): ...
93
```
94
95
[Database Operations](./database-operations.md)
96
97
### Table Operations
98
99
Complete table manipulation including CRUD operations, schema management, indexing, and advanced querying capabilities with automatic column creation and type inference.
100
101
```python { .api }
102
class Table:
103
def insert(self, row, ensure=None, types=None): ...
104
def insert_many(self, rows, chunk_size=1000, ensure=None, types=None): ...
105
def find(self, **kwargs): ...
106
def find_one(self, **kwargs): ...
107
def update(self, row, keys, ensure=None, types=None): ...
108
def upsert(self, row, keys, ensure=None, types=None): ...
109
def delete(self, **filters): ...
110
```
111
112
[Table Operations](./table-operations.md)
113
114
### Chunked Operations
115
116
High-performance batch processing for large datasets using context managers to automatically handle chunked inserts and updates with configurable batch sizes and callback support.
117
118
```python { .api }
119
class ChunkedInsert:
120
def __init__(self, table, chunksize=1000, callback=None): ...
121
def insert(self, item): ...
122
123
class ChunkedUpdate:
124
def __init__(self, table, keys, chunksize=1000, callback=None): ...
125
def update(self, item): ...
126
```
127
128
[Chunked Operations](./chunked-operations.md)
129
130
## Types
131
132
```python { .api }
133
class Types:
134
"""SQLAlchemy type holder and utilities."""
135
integer = Integer # SQLAlchemy Integer type
136
string = Unicode # SQLAlchemy Unicode type
137
text = UnicodeText # SQLAlchemy UnicodeText type
138
float = Float # SQLAlchemy Float type
139
bigint = BigInteger # SQLAlchemy BigInteger type
140
boolean = Boolean # SQLAlchemy Boolean type
141
date = Date # SQLAlchemy Date type
142
datetime = DateTime # SQLAlchemy DateTime type
143
144
def __init__(self, is_postgres=None):
145
"""Initialize type system with database-specific settings."""
146
self.json = JSONB if is_postgres else JSON
147
148
def guess(self, sample):
149
"""
150
Given a sample value, guess the appropriate column type.
151
152
Parameters:
153
- sample: any, sample value to infer type from
154
155
Returns:
156
SQLAlchemy type appropriate for the sample
157
"""
158
159
class DatasetException(Exception):
160
"""Base exception class for dataset errors."""
161
162
# Default row container type (can be customized via row_type parameter)
163
from collections import OrderedDict
164
row_type = OrderedDict
165
```