Tessl Tile for pypi/petl@1.7.0

or run

npx @tessl/cli init

Version

Tile

Overview

Evals

Files

docs

aggregation.md basic-transformations.md data-io.md data-reshaping.md index.md sorting-joins.md table-operations.md validation-analysis.md

data-reshaping.mddocs/

0
# Data Reshaping
1

2
Advanced reshaping operations including pivoting, melting, transposing, and data normalization. Essential for converting between wide and long data formats and restructuring data for analysis and reporting.
3

4
## Capabilities
5

6
### Wide/Long Format Conversion
7

8
Transform between wide and long data formats for different analysis needs.
9

10
```python { .api }
11
def melt(table, key=None, variables=None, variablefield='variable', valuefield='value') -> Table:
12
    """
13
    Transform from wide to long format.
14
    
15
    Parameters:
16
    - table: Input table
17
    - key: Fields to keep as identifier variables
18
    - variables: Fields to melt (default: all non-key fields)
19
    - variablefield: Name for variable column
20
    - valuefield: Name for value column
21
    
22
    Returns:
23
    Long-format table
24
    """
25

26
def recast(table, key=None, variablefield='variable', valuefield='value', 
27
           samplesize=1000, missing=None, reducers=None, fill=None) -> Table:
28
    """
29
    Transform from long to wide format.
30
    
31
    Parameters:
32
    - table: Input table in long format
33
    - key: Fields that identify observations
34
    - variablefield: Field containing variable names
35
    - valuefield: Field containing values
36
    - samplesize: Number of rows to sample for structure detection
37
    - missing: Value for missing data
38
    - reducers: Functions to handle multiple values per cell
39
    - fill: Value to fill missing combinations
40
    
41
    Returns:
42
    Wide-format table
43
    """
44
```
45

46
### Pivoting and Cross-tabulation
47

48
Create pivot tables and cross-tabulations for data analysis.
49

50
```python { .api }
51
def pivot(table, f1, f2, f3, aggfun, missing=None, presorted=False, 
52
          buffersize=None, tempdir=None, cache=True) -> Table:
53
    """
54
    Construct a pivot table.
55
    
56
    Parameters:
57
    - table: Input table
58
    - f1: Field for row labels
59
    - f2: Field for column labels
60
    - f3: Field containing values to aggregate
61
    - aggfun: Aggregation function (sum, count, mean, etc.)
62
    - missing: Value for missing data
63
    - presorted: If True, table is pre-sorted
64
    - buffersize: Buffer size for sorting
65
    - tempdir: Directory for temporary files
66
    - cache: Whether to cache results
67
    
68
    Returns:
69
    Pivot table with f1 as rows, f2 as columns, aggregated f3 as values
70
    """
71
```
72

73
### Table Structure Transformation
74

75
Fundamental operations for changing table structure and layout.
76

77
```python { .api }
78
def transpose(table) -> Table:
79
    """
80
    Transpose the table (swap rows and columns).
81
    
82
    Parameters:
83
    - table: Input table
84
    
85
    Returns:
86
    Transposed table where original columns become rows
87
    """
88

89
def flatten(table) -> Table:
90
    """
91
    Flatten nested field values.
92
    
93
    Parameters:
94
    - table: Input table with nested/iterable field values
95
    
96
    Returns:
97
    Table with flattened field values
98
    """
99

100
def unflatten(*args, **kwargs) -> Table:
101
    """
102
    Unflatten previously flattened data.
103
    
104
    Parameters:
105
    - args: Unflatten specifications
106
    - kwargs: Additional options
107
    
108
    Returns:
109
    Table with nested structure restored
110
    """
111
```
112

113
### Field Unpacking
114

115
Extract nested data structures into separate fields.
116

117
```python { .api }
118
def unpack(table, field, newfields=None, include_original=False, missing=None) -> Table:
119
    """
120
    Unpack iterable field values into separate fields.
121
    
122
    Parameters:
123
    - table: Input table
124
    - field: Field containing iterable values (lists, tuples)
125
    - newfields: Names for unpacked fields
126
    - include_original: Whether to keep original field
127
    - missing: Value for missing data
128
    
129
    Returns:
130
    Table with unpacked fields
131
    """
132

133
def unpackdict(table, field, keys=None, includeoriginal=False, 
134
               samplesize=1000, missing=None) -> Table:
135
    """
136
    Unpack dictionary field values into separate fields.
137
    
138
    Parameters:
139
    - table: Input table
140
    - field: Field containing dictionary values
141
    - keys: Specific keys to unpack (default: all found keys)
142
    - includeoriginal: Whether to keep original field
143
    - samplesize: Number of rows to sample for key detection
144
    - missing: Value for missing keys
145
    
146
    Returns:
147
    Table with dictionary keys as separate fields
148
    """
149
```
150

151
### Regular Expression Reshaping
152

153
Use regular expressions to extract and reshape data from text fields.
154

155
```python { .api }
156
def capture(table, field, pattern, newfields=None, include_original=False, 
157
            flags=0, fill=None) -> Table:
158
    """
159
    Extract data using regular expression capture groups.
160
    
161
    Parameters:
162
    - table: Input table
163
    - field: Field to apply regex pattern
164
    - pattern: Regular expression with capture groups
165
    - newfields: Names for captured groups
166
    - include_original: Whether to keep original field
167
    - flags: Regular expression flags
168
    - fill: Value for non-matching rows
169
    
170
    Returns:
171
    Table with captured groups as new fields
172
    """
173

174
def split(table, field, pattern, newfields=None, include_original=False, 
175
          maxsplit=0, flags=0) -> Table:
176
    """
177
    Split field values using regular expression.
178
    
179
    Parameters:
180
    - table: Input table
181
    - field: Field to split
182
    - pattern: Regular expression pattern for splitting
183
    - newfields: Names for split parts
184
    - include_original: Whether to keep original field
185
    - maxsplit: Maximum number of splits
186
    - flags: Regular expression flags
187
    
188
    Returns:
189
    Table with split parts as separate fields
190
    """
191

192
def splitdown(table, field, pattern, maxsplit=0, flags=0) -> Table:
193
    """
194
    Split field values and create multiple rows.
195
    
196
    Parameters:
197
    - table: Input table
198
    - field: Field to split
199
    - pattern: Regular expression pattern
200
    - maxsplit: Maximum number of splits
201
    - flags: Regular expression flags
202
    
203
    Returns:
204
    Table with additional rows for split values
205
    """
206
```
207

208
## Usage Examples
209

210
### Wide to Long Format (Melt)
211

212
```python
213
import petl as etl
214

215
# Wide format data
216
wide_data = [
217
    ['id', 'name', 'jan_sales', 'feb_sales', 'mar_sales'],
218
    [1, 'Alice', 100, 150, 200],
219
    [2, 'Bob', 120, 180, 160]
220
]
221
wide_table = etl.wrap(wide_data)
222

223
# Convert to long format
224
long_table = etl.melt(wide_table, 
225
                     key=['id', 'name'],
226
                     variablefield='month',
227
                     valuefield='sales')
228
# Result: id, name, month, sales
229
#         1, Alice, jan_sales, 100
230
#         1, Alice, feb_sales, 150
231
#         etc.
232
```
233

234
### Long to Wide Format (Recast)
235

236
```python
237
import petl as etl
238

239
long_data = [
240
    ['id', 'name', 'month', 'sales'],
241
    [1, 'Alice', 'jan', 100],
242
    [1, 'Alice', 'feb', 150],
243
    [2, 'Bob', 'jan', 120],
244
    [2, 'Bob', 'feb', 180]
245
]
246
long_table = etl.wrap(long_data)
247

248
# Convert to wide format
249
wide_table = etl.recast(long_table,
250
                       key=['id', 'name'],
251
                       variablefield='month',
252
                       valuefield='sales')
253
# Result: id, name, jan, feb
254
#         1, Alice, 100, 150
255
#         2, Bob, 120, 180
256
```
257

258
### Pivot Tables
259

260
```python
261
import petl as etl
262

263
sales = etl.fromcsv('sales.csv')  # region, product, quarter, amount
264

265
# Create pivot table
266
pivot_table = etl.pivot(sales, 'region', 'quarter', 'amount', sum)
267
# Regions as rows, quarters as columns, sum of amounts as values
268

269
# Multiple aggregations
270
pivot_with_count = etl.pivot(sales, 'product', 'region', 'amount', 
271
                            lambda values: (sum(values), len(values)))
272
```
273

274
### Data Unpacking
275

276
```python
277
import petl as etl
278

279
# Unpack list/tuple fields
280
data_with_coords = [
281
    ['name', 'coordinates'],
282
    ['Location A', (40.7128, -74.0060)],
283
    ['Location B', (34.0522, -118.2437)]
284
]
285
table = etl.wrap(data_with_coords)
286

287
unpacked = etl.unpack(table, 'coordinates', ['latitude', 'longitude'])
288
# Result: name, latitude, longitude
289

290
# Unpack dictionary fields
291
data_with_dict = [
292
    ['name', 'details'],
293
    ['Alice', {'age': 30, 'city': 'NYC', 'salary': 75000}],
294
    ['Bob', {'age': 25, 'city': 'LA', 'salary': 65000}]
295
]
296
dict_table = etl.wrap(data_with_dict)
297

298
unpacked_dict = etl.unpackdict(dict_table, 'details')
299
# Result: name, age, city, salary
300
```
301

302
### Regular Expression Extraction
303

304
```python
305
import petl as etl
306

307
# Extract structured data from text
308
log_data = [
309
    ['timestamp', 'log_entry'],
310
    ['2023-01-01', '2023-01-01 10:30:45 ERROR user123 failed login'],
311
    ['2023-01-01', '2023-01-01 10:31:12 INFO user456 successful login']
312
]
313
log_table = etl.wrap(log_data)
314

315
# Extract components using regex
316
extracted = etl.capture(log_table, 'log_entry',
317
                       r'(\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}) (\w+) (\w+) (.+)',
318
                       ['datetime', 'level', 'user', 'message'])
319

320
# Split email addresses
321
email_data = [
322
    ['contact'],
323
    ['alice@company.com'],
324
    ['bob@organization.org']
325
]
326
email_table = etl.wrap(email_data)
327

328
split_emails = etl.split(email_table, 'contact', '@', 
329
                        ['username', 'domain'])
330
```
331

332
### Complex Reshaping Workflows
333

334
```python
335
import etl as etl
336

337
# Multi-step reshaping process
338
raw_data = etl.fromcsv('complex_data.csv')
339

340
# 1. Unpack nested JSON-like field
341
step1 = etl.unpackdict(raw_data, 'metadata')
342

343
# 2. Melt measurement columns
344
step2 = etl.melt(step1, 
345
                key=['id', 'timestamp', 'location'],
346
                variables=['temp', 'humidity', 'pressure'],
347
                variablefield='measurement_type',
348
                valuefield='value')
349

350
# 3. Extract date components
351
step3 = etl.capture(step2, 'timestamp',
352
                   r'(\d{4})-(\d{2})-(\d{2})',
353
                   ['year', 'month', 'day'])
354

355
# 4. Pivot for final analysis format
356
final = etl.pivot(step3, ['location', 'year'], 'measurement_type', 'value', 
357
                 lambda vals: sum(vals) / len(vals))  # average
358
```

Version

Tile

Files

data-reshaping.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}docs/

data-reshaping.mddocs/