0
# Data Reshaping
1
2
Advanced reshaping operations including pivoting, melting, transposing, and data normalization. Essential for converting between wide and long data formats and restructuring data for analysis and reporting.
3
4
## Capabilities
5
6
### Wide/Long Format Conversion
7
8
Transform between wide and long data formats for different analysis needs.
9
10
```python { .api }
11
def melt(table, key=None, variables=None, variablefield='variable', valuefield='value') -> Table:
12
"""
13
Transform from wide to long format.
14
15
Parameters:
16
- table: Input table
17
- key: Fields to keep as identifier variables
18
- variables: Fields to melt (default: all non-key fields)
19
- variablefield: Name for variable column
20
- valuefield: Name for value column
21
22
Returns:
23
Long-format table
24
"""
25
26
def recast(table, key=None, variablefield='variable', valuefield='value',
27
samplesize=1000, missing=None, reducers=None, fill=None) -> Table:
28
"""
29
Transform from long to wide format.
30
31
Parameters:
32
- table: Input table in long format
33
- key: Fields that identify observations
34
- variablefield: Field containing variable names
35
- valuefield: Field containing values
36
- samplesize: Number of rows to sample for structure detection
37
- missing: Value for missing data
38
- reducers: Functions to handle multiple values per cell
39
- fill: Value to fill missing combinations
40
41
Returns:
42
Wide-format table
43
"""
44
```
45
46
### Pivoting and Cross-tabulation
47
48
Create pivot tables and cross-tabulations for data analysis.
49
50
```python { .api }
51
def pivot(table, f1, f2, f3, aggfun, missing=None, presorted=False,
52
buffersize=None, tempdir=None, cache=True) -> Table:
53
"""
54
Construct a pivot table.
55
56
Parameters:
57
- table: Input table
58
- f1: Field for row labels
59
- f2: Field for column labels
60
- f3: Field containing values to aggregate
61
- aggfun: Aggregation function (sum, count, mean, etc.)
62
- missing: Value for missing data
63
- presorted: If True, table is pre-sorted
64
- buffersize: Buffer size for sorting
65
- tempdir: Directory for temporary files
66
- cache: Whether to cache results
67
68
Returns:
69
Pivot table with f1 as rows, f2 as columns, aggregated f3 as values
70
"""
71
```
72
73
### Table Structure Transformation
74
75
Fundamental operations for changing table structure and layout.
76
77
```python { .api }
78
def transpose(table) -> Table:
79
"""
80
Transpose the table (swap rows and columns).
81
82
Parameters:
83
- table: Input table
84
85
Returns:
86
Transposed table where original columns become rows
87
"""
88
89
def flatten(table) -> Table:
90
"""
91
Flatten nested field values.
92
93
Parameters:
94
- table: Input table with nested/iterable field values
95
96
Returns:
97
Table with flattened field values
98
"""
99
100
def unflatten(*args, **kwargs) -> Table:
101
"""
102
Unflatten previously flattened data.
103
104
Parameters:
105
- args: Unflatten specifications
106
- kwargs: Additional options
107
108
Returns:
109
Table with nested structure restored
110
"""
111
```
112
113
### Field Unpacking
114
115
Extract nested data structures into separate fields.
116
117
```python { .api }
118
def unpack(table, field, newfields=None, include_original=False, missing=None) -> Table:
119
"""
120
Unpack iterable field values into separate fields.
121
122
Parameters:
123
- table: Input table
124
- field: Field containing iterable values (lists, tuples)
125
- newfields: Names for unpacked fields
126
- include_original: Whether to keep original field
127
- missing: Value for missing data
128
129
Returns:
130
Table with unpacked fields
131
"""
132
133
def unpackdict(table, field, keys=None, includeoriginal=False,
134
samplesize=1000, missing=None) -> Table:
135
"""
136
Unpack dictionary field values into separate fields.
137
138
Parameters:
139
- table: Input table
140
- field: Field containing dictionary values
141
- keys: Specific keys to unpack (default: all found keys)
142
- includeoriginal: Whether to keep original field
143
- samplesize: Number of rows to sample for key detection
144
- missing: Value for missing keys
145
146
Returns:
147
Table with dictionary keys as separate fields
148
"""
149
```
150
151
### Regular Expression Reshaping
152
153
Use regular expressions to extract and reshape data from text fields.
154
155
```python { .api }
156
def capture(table, field, pattern, newfields=None, include_original=False,
157
flags=0, fill=None) -> Table:
158
"""
159
Extract data using regular expression capture groups.
160
161
Parameters:
162
- table: Input table
163
- field: Field to apply regex pattern
164
- pattern: Regular expression with capture groups
165
- newfields: Names for captured groups
166
- include_original: Whether to keep original field
167
- flags: Regular expression flags
168
- fill: Value for non-matching rows
169
170
Returns:
171
Table with captured groups as new fields
172
"""
173
174
def split(table, field, pattern, newfields=None, include_original=False,
175
maxsplit=0, flags=0) -> Table:
176
"""
177
Split field values using regular expression.
178
179
Parameters:
180
- table: Input table
181
- field: Field to split
182
- pattern: Regular expression pattern for splitting
183
- newfields: Names for split parts
184
- include_original: Whether to keep original field
185
- maxsplit: Maximum number of splits
186
- flags: Regular expression flags
187
188
Returns:
189
Table with split parts as separate fields
190
"""
191
192
def splitdown(table, field, pattern, maxsplit=0, flags=0) -> Table:
193
"""
194
Split field values and create multiple rows.
195
196
Parameters:
197
- table: Input table
198
- field: Field to split
199
- pattern: Regular expression pattern
200
- maxsplit: Maximum number of splits
201
- flags: Regular expression flags
202
203
Returns:
204
Table with additional rows for split values
205
"""
206
```
207
208
## Usage Examples
209
210
### Wide to Long Format (Melt)
211
212
```python
213
import petl as etl
214
215
# Wide format data
216
wide_data = [
217
['id', 'name', 'jan_sales', 'feb_sales', 'mar_sales'],
218
[1, 'Alice', 100, 150, 200],
219
[2, 'Bob', 120, 180, 160]
220
]
221
wide_table = etl.wrap(wide_data)
222
223
# Convert to long format
224
long_table = etl.melt(wide_table,
225
key=['id', 'name'],
226
variablefield='month',
227
valuefield='sales')
228
# Result: id, name, month, sales
229
# 1, Alice, jan_sales, 100
230
# 1, Alice, feb_sales, 150
231
# etc.
232
```
233
234
### Long to Wide Format (Recast)
235
236
```python
237
import petl as etl
238
239
long_data = [
240
['id', 'name', 'month', 'sales'],
241
[1, 'Alice', 'jan', 100],
242
[1, 'Alice', 'feb', 150],
243
[2, 'Bob', 'jan', 120],
244
[2, 'Bob', 'feb', 180]
245
]
246
long_table = etl.wrap(long_data)
247
248
# Convert to wide format
249
wide_table = etl.recast(long_table,
250
key=['id', 'name'],
251
variablefield='month',
252
valuefield='sales')
253
# Result: id, name, jan, feb
254
# 1, Alice, 100, 150
255
# 2, Bob, 120, 180
256
```
257
258
### Pivot Tables
259
260
```python
261
import petl as etl
262
263
sales = etl.fromcsv('sales.csv') # region, product, quarter, amount
264
265
# Create pivot table
266
pivot_table = etl.pivot(sales, 'region', 'quarter', 'amount', sum)
267
# Regions as rows, quarters as columns, sum of amounts as values
268
269
# Multiple aggregations
270
pivot_with_count = etl.pivot(sales, 'product', 'region', 'amount',
271
lambda values: (sum(values), len(values)))
272
```
273
274
### Data Unpacking
275
276
```python
277
import petl as etl
278
279
# Unpack list/tuple fields
280
data_with_coords = [
281
['name', 'coordinates'],
282
['Location A', (40.7128, -74.0060)],
283
['Location B', (34.0522, -118.2437)]
284
]
285
table = etl.wrap(data_with_coords)
286
287
unpacked = etl.unpack(table, 'coordinates', ['latitude', 'longitude'])
288
# Result: name, latitude, longitude
289
290
# Unpack dictionary fields
291
data_with_dict = [
292
['name', 'details'],
293
['Alice', {'age': 30, 'city': 'NYC', 'salary': 75000}],
294
['Bob', {'age': 25, 'city': 'LA', 'salary': 65000}]
295
]
296
dict_table = etl.wrap(data_with_dict)
297
298
unpacked_dict = etl.unpackdict(dict_table, 'details')
299
# Result: name, age, city, salary
300
```
301
302
### Regular Expression Extraction
303
304
```python
305
import petl as etl
306
307
# Extract structured data from text
308
log_data = [
309
['timestamp', 'log_entry'],
310
['2023-01-01', '2023-01-01 10:30:45 ERROR user123 failed login'],
311
['2023-01-01', '2023-01-01 10:31:12 INFO user456 successful login']
312
]
313
log_table = etl.wrap(log_data)
314
315
# Extract components using regex
316
extracted = etl.capture(log_table, 'log_entry',
317
r'(\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}) (\w+) (\w+) (.+)',
318
['datetime', 'level', 'user', 'message'])
319
320
# Split email addresses
321
email_data = [
322
['contact'],
323
['alice@company.com'],
324
['bob@organization.org']
325
]
326
email_table = etl.wrap(email_data)
327
328
split_emails = etl.split(email_table, 'contact', '@',
329
['username', 'domain'])
330
```
331
332
### Complex Reshaping Workflows
333
334
```python
335
import etl as etl
336
337
# Multi-step reshaping process
338
raw_data = etl.fromcsv('complex_data.csv')
339
340
# 1. Unpack nested JSON-like field
341
step1 = etl.unpackdict(raw_data, 'metadata')
342
343
# 2. Melt measurement columns
344
step2 = etl.melt(step1,
345
key=['id', 'timestamp', 'location'],
346
variables=['temp', 'humidity', 'pressure'],
347
variablefield='measurement_type',
348
valuefield='value')
349
350
# 3. Extract date components
351
step3 = etl.capture(step2, 'timestamp',
352
r'(\d{4})-(\d{2})-(\d{2})',
353
['year', 'month', 'day'])
354
355
# 4. Pivot for final analysis format
356
final = etl.pivot(step3, ['location', 'year'], 'measurement_type', 'value',
357
lambda vals: sum(vals) / len(vals)) # average
358
```