or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

aggregation.mdbasic-transformations.mddata-io.mddata-reshaping.mdindex.mdsorting-joins.mdtable-operations.mdvalidation-analysis.md

data-reshaping.mddocs/

0

# Data Reshaping

1

2

Advanced reshaping operations including pivoting, melting, transposing, and data normalization. Essential for converting between wide and long data formats and restructuring data for analysis and reporting.

3

4

## Capabilities

5

6

### Wide/Long Format Conversion

7

8

Transform between wide and long data formats for different analysis needs.

9

10

```python { .api }

11

def melt(table, key=None, variables=None, variablefield='variable', valuefield='value') -> Table:

12

"""

13

Transform from wide to long format.

14

15

Parameters:

16

- table: Input table

17

- key: Fields to keep as identifier variables

18

- variables: Fields to melt (default: all non-key fields)

19

- variablefield: Name for variable column

20

- valuefield: Name for value column

21

22

Returns:

23

Long-format table

24

"""

25

26

def recast(table, key=None, variablefield='variable', valuefield='value',

27

samplesize=1000, missing=None, reducers=None, fill=None) -> Table:

28

"""

29

Transform from long to wide format.

30

31

Parameters:

32

- table: Input table in long format

33

- key: Fields that identify observations

34

- variablefield: Field containing variable names

35

- valuefield: Field containing values

36

- samplesize: Number of rows to sample for structure detection

37

- missing: Value for missing data

38

- reducers: Functions to handle multiple values per cell

39

- fill: Value to fill missing combinations

40

41

Returns:

42

Wide-format table

43

"""

44

```

45

46

### Pivoting and Cross-tabulation

47

48

Create pivot tables and cross-tabulations for data analysis.

49

50

```python { .api }

51

def pivot(table, f1, f2, f3, aggfun, missing=None, presorted=False,

52

buffersize=None, tempdir=None, cache=True) -> Table:

53

"""

54

Construct a pivot table.

55

56

Parameters:

57

- table: Input table

58

- f1: Field for row labels

59

- f2: Field for column labels

60

- f3: Field containing values to aggregate

61

- aggfun: Aggregation function (sum, count, mean, etc.)

62

- missing: Value for missing data

63

- presorted: If True, table is pre-sorted

64

- buffersize: Buffer size for sorting

65

- tempdir: Directory for temporary files

66

- cache: Whether to cache results

67

68

Returns:

69

Pivot table with f1 as rows, f2 as columns, aggregated f3 as values

70

"""

71

```

72

73

### Table Structure Transformation

74

75

Fundamental operations for changing table structure and layout.

76

77

```python { .api }

78

def transpose(table) -> Table:

79

"""

80

Transpose the table (swap rows and columns).

81

82

Parameters:

83

- table: Input table

84

85

Returns:

86

Transposed table where original columns become rows

87

"""

88

89

def flatten(table) -> Table:

90

"""

91

Flatten nested field values.

92

93

Parameters:

94

- table: Input table with nested/iterable field values

95

96

Returns:

97

Table with flattened field values

98

"""

99

100

def unflatten(*args, **kwargs) -> Table:

101

"""

102

Unflatten previously flattened data.

103

104

Parameters:

105

- args: Unflatten specifications

106

- kwargs: Additional options

107

108

Returns:

109

Table with nested structure restored

110

"""

111

```

112

113

### Field Unpacking

114

115

Extract nested data structures into separate fields.

116

117

```python { .api }

118

def unpack(table, field, newfields=None, include_original=False, missing=None) -> Table:

119

"""

120

Unpack iterable field values into separate fields.

121

122

Parameters:

123

- table: Input table

124

- field: Field containing iterable values (lists, tuples)

125

- newfields: Names for unpacked fields

126

- include_original: Whether to keep original field

127

- missing: Value for missing data

128

129

Returns:

130

Table with unpacked fields

131

"""

132

133

def unpackdict(table, field, keys=None, includeoriginal=False,

134

samplesize=1000, missing=None) -> Table:

135

"""

136

Unpack dictionary field values into separate fields.

137

138

Parameters:

139

- table: Input table

140

- field: Field containing dictionary values

141

- keys: Specific keys to unpack (default: all found keys)

142

- includeoriginal: Whether to keep original field

143

- samplesize: Number of rows to sample for key detection

144

- missing: Value for missing keys

145

146

Returns:

147

Table with dictionary keys as separate fields

148

"""

149

```

150

151

### Regular Expression Reshaping

152

153

Use regular expressions to extract and reshape data from text fields.

154

155

```python { .api }

156

def capture(table, field, pattern, newfields=None, include_original=False,

157

flags=0, fill=None) -> Table:

158

"""

159

Extract data using regular expression capture groups.

160

161

Parameters:

162

- table: Input table

163

- field: Field to apply regex pattern

164

- pattern: Regular expression with capture groups

165

- newfields: Names for captured groups

166

- include_original: Whether to keep original field

167

- flags: Regular expression flags

168

- fill: Value for non-matching rows

169

170

Returns:

171

Table with captured groups as new fields

172

"""

173

174

def split(table, field, pattern, newfields=None, include_original=False,

175

maxsplit=0, flags=0) -> Table:

176

"""

177

Split field values using regular expression.

178

179

Parameters:

180

- table: Input table

181

- field: Field to split

182

- pattern: Regular expression pattern for splitting

183

- newfields: Names for split parts

184

- include_original: Whether to keep original field

185

- maxsplit: Maximum number of splits

186

- flags: Regular expression flags

187

188

Returns:

189

Table with split parts as separate fields

190

"""

191

192

def splitdown(table, field, pattern, maxsplit=0, flags=0) -> Table:

193

"""

194

Split field values and create multiple rows.

195

196

Parameters:

197

- table: Input table

198

- field: Field to split

199

- pattern: Regular expression pattern

200

- maxsplit: Maximum number of splits

201

- flags: Regular expression flags

202

203

Returns:

204

Table with additional rows for split values

205

"""

206

```

207

208

## Usage Examples

209

210

### Wide to Long Format (Melt)

211

212

```python

213

import petl as etl

214

215

# Wide format data

216

wide_data = [

217

['id', 'name', 'jan_sales', 'feb_sales', 'mar_sales'],

218

[1, 'Alice', 100, 150, 200],

219

[2, 'Bob', 120, 180, 160]

220

]

221

wide_table = etl.wrap(wide_data)

222

223

# Convert to long format

224

long_table = etl.melt(wide_table,

225

key=['id', 'name'],

226

variablefield='month',

227

valuefield='sales')

228

# Result: id, name, month, sales

229

# 1, Alice, jan_sales, 100

230

# 1, Alice, feb_sales, 150

231

# etc.

232

```

233

234

### Long to Wide Format (Recast)

235

236

```python

237

import petl as etl

238

239

long_data = [

240

['id', 'name', 'month', 'sales'],

241

[1, 'Alice', 'jan', 100],

242

[1, 'Alice', 'feb', 150],

243

[2, 'Bob', 'jan', 120],

244

[2, 'Bob', 'feb', 180]

245

]

246

long_table = etl.wrap(long_data)

247

248

# Convert to wide format

249

wide_table = etl.recast(long_table,

250

key=['id', 'name'],

251

variablefield='month',

252

valuefield='sales')

253

# Result: id, name, jan, feb

254

# 1, Alice, 100, 150

255

# 2, Bob, 120, 180

256

```

257

258

### Pivot Tables

259

260

```python

261

import petl as etl

262

263

sales = etl.fromcsv('sales.csv') # region, product, quarter, amount

264

265

# Create pivot table

266

pivot_table = etl.pivot(sales, 'region', 'quarter', 'amount', sum)

267

# Regions as rows, quarters as columns, sum of amounts as values

268

269

# Multiple aggregations

270

pivot_with_count = etl.pivot(sales, 'product', 'region', 'amount',

271

lambda values: (sum(values), len(values)))

272

```

273

274

### Data Unpacking

275

276

```python

277

import petl as etl

278

279

# Unpack list/tuple fields

280

data_with_coords = [

281

['name', 'coordinates'],

282

['Location A', (40.7128, -74.0060)],

283

['Location B', (34.0522, -118.2437)]

284

]

285

table = etl.wrap(data_with_coords)

286

287

unpacked = etl.unpack(table, 'coordinates', ['latitude', 'longitude'])

288

# Result: name, latitude, longitude

289

290

# Unpack dictionary fields

291

data_with_dict = [

292

['name', 'details'],

293

['Alice', {'age': 30, 'city': 'NYC', 'salary': 75000}],

294

['Bob', {'age': 25, 'city': 'LA', 'salary': 65000}]

295

]

296

dict_table = etl.wrap(data_with_dict)

297

298

unpacked_dict = etl.unpackdict(dict_table, 'details')

299

# Result: name, age, city, salary

300

```

301

302

### Regular Expression Extraction

303

304

```python

305

import petl as etl

306

307

# Extract structured data from text

308

log_data = [

309

['timestamp', 'log_entry'],

310

['2023-01-01', '2023-01-01 10:30:45 ERROR user123 failed login'],

311

['2023-01-01', '2023-01-01 10:31:12 INFO user456 successful login']

312

]

313

log_table = etl.wrap(log_data)

314

315

# Extract components using regex

316

extracted = etl.capture(log_table, 'log_entry',

317

r'(\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}) (\w+) (\w+) (.+)',

318

['datetime', 'level', 'user', 'message'])

319

320

# Split email addresses

321

email_data = [

322

['contact'],

323

['alice@company.com'],

324

['bob@organization.org']

325

]

326

email_table = etl.wrap(email_data)

327

328

split_emails = etl.split(email_table, 'contact', '@',

329

['username', 'domain'])

330

```

331

332

### Complex Reshaping Workflows

333

334

```python

335

import etl as etl

336

337

# Multi-step reshaping process

338

raw_data = etl.fromcsv('complex_data.csv')

339

340

# 1. Unpack nested JSON-like field

341

step1 = etl.unpackdict(raw_data, 'metadata')

342

343

# 2. Melt measurement columns

344

step2 = etl.melt(step1,

345

key=['id', 'timestamp', 'location'],

346

variables=['temp', 'humidity', 'pressure'],

347

variablefield='measurement_type',

348

valuefield='value')

349

350

# 3. Extract date components

351

step3 = etl.capture(step2, 'timestamp',

352

r'(\d{4})-(\d{2})-(\d{2})',

353

['year', 'month', 'day'])

354

355

# 4. Pivot for final analysis format

356

final = etl.pivot(step3, ['location', 'year'], 'measurement_type', 'value',

357

lambda vals: sum(vals) / len(vals)) # average

358

```