or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

tessl/pypi-dataset

Toolkit for Python-based database access that makes reading and writing data in databases as simple as working with JSON files

Workspace
tessl
Visibility
Public
Created
Last updated
Describes
pypipkg:pypi/dataset@1.6.x

To install, run

npx @tessl/cli install tessl/pypi-dataset@1.6.0

0

# Dataset

1

2

Dataset is a Python toolkit that makes reading and writing data in databases as simple as working with JSON files. It provides automatic schema creation, upsert functionality, and simple query helpers with support for SQLite, PostgreSQL, and MySQL databases. Built on SQLAlchemy, dataset abstracts away complex SQL operations while providing powerful data management capabilities.

3

4

## Package Information

5

6

- **Package Name**: dataset

7

- **Language**: Python

8

- **Installation**: `pip install dataset`

9

10

## Core Imports

11

12

```python

13

import dataset

14

```

15

16

Common patterns:

17

18

```python

19

from dataset import connect, Database, Table

20

```

21

22

For chunked operations:

23

24

```python

25

from dataset import chunked

26

# Then use: chunked.ChunkedInsert, chunked.ChunkedUpdate

27

```

28

29

## Basic Usage

30

31

```python

32

import dataset

33

34

# Connect to database (defaults to in-memory SQLite)

35

db = dataset.connect('sqlite:///mydatabase.db')

36

37

# Get a table (creates if doesn't exist)

38

table = db['people']

39

40

# Insert data (automatically creates columns)

41

table.insert({'name': 'John Doe', 'age': 30, 'city': 'New York'})

42

table.insert({'name': 'Jane Smith', 'age': 25, 'city': 'Boston'})

43

44

# Query data

45

for row in table.find(city='New York'):

46

print(row['name'], row['age'])

47

48

# Find single record

49

person = table.find_one(name='John Doe')

50

print(person)

51

52

# Update records

53

table.update({'name': 'John Doe', 'age': 31}, ['name'])

54

55

# Use transactions

56

with db:

57

table.insert({'name': 'Bob Johnson', 'age': 28})

58

table.upsert({'name': 'John Doe', 'age': 32}, ['name'])

59

60

# Close connection

61

db.close()

62

```

63

64

## Architecture

65

66

Dataset follows a layered architecture:

67

68

- **Database**: Top-level connection manager handling multiple tables, transactions, and database-specific operations

69

- **Table**: Individual table manager providing CRUD operations, schema management, and query building

70

- **Types System**: Automatic type inference and SQLAlchemy type mapping for schema creation

71

- **Connection Pooling**: Thread-safe connection management with SQLAlchemy engine integration

72

73

This design enables automatic schema evolution, simplified data operations, and seamless database portability while maintaining full SQLAlchemy power for complex queries.

74

75

## Capabilities

76

77

### Database Connections

78

79

Core database connection management, table access, transaction control, and raw SQL query execution. Provides the foundation for all dataset operations.

80

81

```python { .api }

82

def connect(url=None, schema=None, engine_kwargs=None, ensure_schema=True,

83

row_type=OrderedDict, sqlite_wal_mode=True, on_connect_statements=None):

84

"""Opens a new connection to a database."""

85

86

class Database:

87

def __init__(self, url, **kwargs): ...

88

def get_table(self, table_name, **kwargs): ...

89

def query(self, query, *args, **kwargs): ...

90

def begin(self): ...

91

def commit(self): ...

92

def rollback(self): ...

93

```

94

95

[Database Operations](./database-operations.md)

96

97

### Table Operations

98

99

Complete table manipulation including CRUD operations, schema management, indexing, and advanced querying capabilities with automatic column creation and type inference.

100

101

```python { .api }

102

class Table:

103

def insert(self, row, ensure=None, types=None): ...

104

def insert_many(self, rows, chunk_size=1000, ensure=None, types=None): ...

105

def find(self, **kwargs): ...

106

def find_one(self, **kwargs): ...

107

def update(self, row, keys, ensure=None, types=None): ...

108

def upsert(self, row, keys, ensure=None, types=None): ...

109

def delete(self, **filters): ...

110

```

111

112

[Table Operations](./table-operations.md)

113

114

### Chunked Operations

115

116

High-performance batch processing for large datasets using context managers to automatically handle chunked inserts and updates with configurable batch sizes and callback support.

117

118

```python { .api }

119

class ChunkedInsert:

120

def __init__(self, table, chunksize=1000, callback=None): ...

121

def insert(self, item): ...

122

123

class ChunkedUpdate:

124

def __init__(self, table, keys, chunksize=1000, callback=None): ...

125

def update(self, item): ...

126

```

127

128

[Chunked Operations](./chunked-operations.md)

129

130

## Types

131

132

```python { .api }

133

class Types:

134

"""SQLAlchemy type holder and utilities."""

135

integer = Integer # SQLAlchemy Integer type

136

string = Unicode # SQLAlchemy Unicode type

137

text = UnicodeText # SQLAlchemy UnicodeText type

138

float = Float # SQLAlchemy Float type

139

bigint = BigInteger # SQLAlchemy BigInteger type

140

boolean = Boolean # SQLAlchemy Boolean type

141

date = Date # SQLAlchemy Date type

142

datetime = DateTime # SQLAlchemy DateTime type

143

144

def __init__(self, is_postgres=None):

145

"""Initialize type system with database-specific settings."""

146

self.json = JSONB if is_postgres else JSON

147

148

def guess(self, sample):

149

"""

150

Given a sample value, guess the appropriate column type.

151

152

Parameters:

153

- sample: any, sample value to infer type from

154

155

Returns:

156

SQLAlchemy type appropriate for the sample

157

"""

158

159

class DatasetException(Exception):

160

"""Base exception class for dataset errors."""

161

162

# Default row container type (can be customized via row_type parameter)

163

from collections import OrderedDict

164

row_type = OrderedDict

165

```