Tessl Tile for pypi/petl@1.7.0

or run

npx @tessl/cli init

Version

Tile

Overview

Evals

Files

docs

aggregation.md basic-transformations.md data-io.md data-reshaping.md index.md sorting-joins.md table-operations.md validation-analysis.md

index.mddocs/

0
# PETL - Python ETL Library
1

2
A comprehensive Python library for extracting, transforming and loading tables of data. PETL enables developers to efficiently process, manipulate, and transform data tables through a fluent API that supports various data sources including CSV, JSON, databases, Excel files, and remote sources. The library offers extensive transformation capabilities including filtering, sorting, joining, aggregating, pivoting, and data type conversions, with built-in support for handling large datasets through memory-efficient iterators and streaming operations.
3

4
## Package Information
5

6
- **Package Name**: petl
7
- **Language**: Python
8
- **Installation**: `pip install petl`
9

10
## Core Imports
11

12
```python
13
import petl
14
```
15

16
Common usage pattern:
17

18
```python
19
import petl as etl
20
```
21

22
All functions can be imported directly:
23

24
```python
25
from petl import fromcsv, tocsv, cut, select, join
26
```
27

28
## Basic Usage
29

30
```python
31
import petl as etl
32

33
# Load data from CSV
34
table = etl.fromcsv('data.csv')
35

36
# Transform data with method chaining
37
result = table.cut('name', 'age', 'city').select('age', lambda age: age > 18).sort('name')
38

39
# Save to new file
40
result.tocsv('adults.csv')
41

42
# Alternative functional style
43
result = etl.tocsv(
44
    etl.sort(
45
        etl.select(
46
            etl.cut(
47
                etl.fromcsv('data.csv'), 
48
                'name', 'age', 'city'
49
            ), 
50
            'age', lambda age: age > 18
51
        ), 
52
        'name'
53
    ), 
54
    'adults.csv'
55
)
56
```
57

58
## Architecture
59

60
PETL is built around the **Table** abstraction - an iterable container that represents tabular data. The core design principles include:
61

62
- **Lazy Evaluation**: Transformations are applied only when data is consumed, enabling efficient processing of large datasets
63
- **Functional Programming**: Operations return new table objects, allowing for method chaining and immutable transformations
64
- **Memory Efficiency**: Built-in support for external sorting, buffering, and streaming for memory-constrained environments
65
- **Source Agnostic**: Unified interface for reading/writing data regardless of source format
66
- **Iterator-based**: All operations use Python iterators and generators for optimal memory usage
67

68
The library is organized into six main modules: **util** (table operations), **transform** (data transformations), **io** (input/output), **comparison** (sorting utilities), **errors** (exceptions), and **config** (settings).
69

70
## Capabilities
71

72
### Table Creation and Access
73

74
Core functionality for creating, accessing, and manipulating table structures. Includes table creation from various sources, header manipulation, data access, and conversion to different formats.
75

76
```python { .api }
77
def empty() -> Table: ...
78
def header(table): ...
79
def fieldnames(table): ...
80
def data(table, *sliceargs): ...
81
def records(table, *sliceargs, **kwargs): ...
82
def dicts(table, *sliceargs, **kwargs): ...
83
def namedtuples(table, *sliceargs, **kwargs): ...
84
def values(table, *field, **kwargs): ...
85
def columns(table, missing=None): ...
86
def facetcolumns(table, key, missing=None): ...
87
def expr(expression_text, trusted=True): ...
88
def rowgroupby(table, key, value=None): ...
89
```
90

91
[Table Operations](./table-operations.md)
92

93
### Data Input/Output
94

95
Comprehensive support for reading and writing data from various sources including CSV, JSON, XML, Excel, databases, and many specialized formats. Supports both file-based and streaming I/O operations.
96

97
```python { .api }
98
def fromcsv(source=None, **kwargs) -> Table: ...
99
def tocsv(table, source=None, **kwargs): ...
100
def fromjson(source, **kwargs) -> Table: ...
101
def tojson(table, source=None, **kwargs): ...
102
def fromdb(dbo, query, **kwargs) -> Table: ...
103
def todb(table, dbo, tablename, **kwargs): ...
104
```
105

106
[Data I/O](./data-io.md)
107

108
### Data Transformations
109

110
Core transformation operations for reshaping, filtering, and manipulating table data. Includes field operations, row selection, data conversion, and structural transformations.
111

112
```python { .api }
113
def cut(table, *args, **kwargs) -> Table: ...
114
def select(table, *args, **kwargs) -> Table: ...
115
def convert(table, *args, **kwargs) -> Table: ...
116
def rename(table, *args, **kwargs) -> Table: ...
117
def addfield(table, field, value=None, **kwargs) -> Table: ...
118
```
119

120
[Basic Transformations](./basic-transformations.md)
121

122
### Sorting and Joining
123

124
Advanced operations for sorting data and combining multiple tables through various join types. Includes both memory-based and disk-based sorting for large datasets.
125

126
```python { .api }
127
def sort(table, key=None, reverse=False, **kwargs) -> Table: ...
128
def join(table1, table2, key=None, **kwargs) -> Table: ...
129
def leftjoin(left, right, key=None, **kwargs) -> Table: ...
130
def outerjoin(left, right, key=None, **kwargs) -> Table: ...
131
def hashjoin(left, right, key=None, **kwargs) -> Table: ...
132
```
133

134
[Sorting and Joins](./sorting-joins.md)
135

136
### Aggregation and Grouping
137

138
Functions for grouping data and performing aggregation operations. Supports custom aggregation functions and reduction operations on grouped data.
139

140
```python { .api }
141
def aggregate(table, key, aggregation=None, **kwargs) -> Table: ...
142
def rowreduce(table, key, reducer, **kwargs) -> Table: ...
143
def merge(table, key, **kwargs) -> Table: ...
144
def fold(table, key, f, **kwargs) -> Table: ...
145
```
146

147
[Aggregation](./aggregation.md)
148

149
### Data Reshaping
150

151
Advanced reshaping operations including pivoting, melting, transposing, and data normalization. Essential for converting between wide and long data formats.
152

153
```python { .api }
154
def melt(table, key=None, **kwargs) -> Table: ...
155
def recast(table, key=None, **kwargs) -> Table: ...
156
def pivot(table, f1, f2, f3, aggfun, **kwargs) -> Table: ...
157
def transpose(table) -> Table: ...
158
def flatten(table) -> Table: ...
159
```
160

161
[Data Reshaping](./data-reshaping.md)
162

163
### Validation and Analysis
164

165
Tools for data validation, quality assessment, and statistical analysis. Includes data profiling, constraint validation, and summary statistics.
166

167
```python { .api }
168
def validate(table, constraints=None, **kwargs) -> Table: ...
169
def look(table, limit=0, **kwargs): ...
170
def see(table, limit=0, **kwargs): ...
171
def stats(table, field): ...
172
def valuecounts(table, *field, **kwargs) -> Table: ...
173
def limits(table, field): ...
174
def nrows(table): ...
175
def typecounts(table, field) -> Table: ...
176
```
177

178
[Validation and Analysis](./validation-analysis.md)
179

180
## Types
181

182
```python { .api }
183
class Table(IterContainer):
184
    """Main table abstraction supporting iteration and transformations."""
185
    def __iter__(self): ...
186
    def __getitem__(self, item): ...  # Field access via string or row slicing
187
    def cut(self, *args, **kwargs) -> 'Table': ...
188
    def select(self, *args, **kwargs) -> 'Table': ...
189
    def sort(self, key=None, **kwargs) -> 'Table': ...
190
    # ... all transformation methods available as methods
191

192
class Record(tuple):
193
    """Named tuple subclass for table rows with field access by name or index."""
194
    def __init__(self, row, flds, missing=None): ...
195
    def __getitem__(self, f): ...  # Access by field name or index
196
    def __getattr__(self, f): ...  # Access by field name as attribute
197
    def get(self, key, default=None): ...  # Safe field access
198

199
class RecordsView(IterContainer):
200
    """Container for table records with record-based access."""
201
    def __iter__(self): ...
202
    def __getitem__(self, key): ...
203

204
class ValuesView(IterContainer):
205
    """Container for values from a specific field."""
206
    def __iter__(self): ...
207
    def __getitem__(self, key): ...
208

209
class IterContainer:
210
    """Base container class for iterable data structures with utility methods."""
211
    def __iter__(self): ...
212
    def __getitem__(self, key): ...
213
    def __len__(self): ...
214
    def __contains__(self, item): ...
215
    def list(self): ...
216
    def tuple(self): ...
217
    def set(self): ...
218
    def counter(self): ...
219

220
# Source classes for data I/O
221
class FileSource:
222
    """File-based data source."""
223
    def __init__(self, filename, mode='r', **kwargs): ...
224

225
class URLSource:
226
    """URL-based data source for web resources."""
227
    def __init__(self, url, **kwargs): ...
228

229
class GzipSource:
230
    """Gzip-compressed file source."""
231
    def __init__(self, filename, mode='r', **kwargs): ...
232

233
class ZipSource:
234
    """ZIP archive file source."""
235
    def __init__(self, filename, member=None, **kwargs): ...
236

237
# Exception classes
238
class DuplicateKeyError(Exception):
239
    """Raised when duplicate keys are found where unique keys are expected."""
240
    pass
241

242
class FieldSelectionError(Exception):
243
    """Raised when field selection is invalid or ambiguous."""
244
    pass
245

246
class ArgumentError(Exception):
247
    """Raised when function arguments are invalid."""
248
    pass
249
```

Version

Tile

Files

index.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}docs/

index.mddocs/