or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

creation.mddatetime.mddiscretisation.mdencoding.mdimputation.mdindex.mdoutliers.mdpreprocessing.mdselection.mdtransformation.mdwrappers.md

transformation.mddocs/

0

# Mathematical Transformations

1

2

Transformers for applying mathematical functions to numerical variables including logarithmic, power, reciprocal, Box-Cox, and Yeo-Johnson transformations to improve data distribution and model performance.

3

4

## Capabilities

5

6

### Logarithmic Transformation

7

8

Applies natural logarithm or base 10 logarithm to numerical variables.

9

10

```python { .api }

11

class LogTransformer:

12

def __init__(self, variables=None, base='e'):

13

"""

14

Initialize LogTransformer.

15

16

Parameters:

17

- variables (list): List of numerical variables to transform. If None, selects all numerical variables

18

- base (str): 'e' for natural logarithm or '10' for base 10 logarithm

19

"""

20

21

def fit(self, X, y=None):

22

"""

23

Validate that variables are positive (no parameters learned).

24

25

Parameters:

26

- X (pandas.DataFrame): Training dataset

27

- y (pandas.Series, optional): Target variable (not used)

28

29

Returns:

30

- self

31

"""

32

33

def transform(self, X):

34

"""

35

Apply logarithm transformation to variables.

36

37

Parameters:

38

- X (pandas.DataFrame): Dataset to transform

39

40

Returns:

41

- pandas.DataFrame: Dataset with log-transformed variables

42

"""

43

44

def fit_transform(self, X, y=None):

45

"""Fit to data, then transform it."""

46

47

def inverse_transform(self, X):

48

"""

49

Convert back to original representation using exponential.

50

51

Parameters:

52

- X (pandas.DataFrame): Dataset with log-transformed values

53

54

Returns:

55

- pandas.DataFrame: Dataset with original scale restored

56

"""

57

```

58

59

**Usage Example**:

60

```python

61

from feature_engine.transformation import LogTransformer

62

import pandas as pd

63

import numpy as np

64

65

# Sample data with positive values

66

data = {'price': [100, 200, 500, 1000, 2000],

67

'volume': [10, 25, 50, 100, 200]}

68

df = pd.DataFrame(data)

69

70

# Natural log transformation

71

transformer = LogTransformer(base='e')

72

df_transformed = transformer.fit_transform(df)

73

74

# Base 10 log transformation

75

transformer = LogTransformer(base='10')

76

df_transformed = transformer.fit_transform(df)

77

78

# Inverse transformation

79

df_original = transformer.inverse_transform(df_transformed)

80

```

81

82

### Log Plus Constant Transformation

83

84

Applies log(x + C) transformation where C is a positive constant, useful for data with zeros or negative values.

85

86

```python { .api }

87

class LogCpTransformer:

88

def __init__(self, variables=None, base='e', C='auto'):

89

"""

90

Initialize LogCpTransformer.

91

92

Parameters:

93

- variables (list): List of numerical variables to transform. If None, selects all numerical variables

94

- base (str): 'e' for natural logarithm or '10' for base 10 logarithm

95

- C (int/float/str/dict): Constant to add before log. 'auto' calculates optimal C

96

"""

97

98

def fit(self, X, y=None):

99

"""

100

Learn constant C if C='auto', otherwise validate input.

101

102

Parameters:

103

- X (pandas.DataFrame): Training dataset

104

- y (pandas.Series, optional): Target variable (not used)

105

106

Returns:

107

- self

108

"""

109

110

def transform(self, X):

111

"""

112

Apply log(x + C) transformation to variables.

113

114

Parameters:

115

- X (pandas.DataFrame): Dataset to transform

116

117

Returns:

118

- pandas.DataFrame: Dataset with log(x + C) transformed variables

119

"""

120

121

def fit_transform(self, X, y=None):

122

"""Fit to data, then transform it."""

123

124

def inverse_transform(self, X):

125

"""

126

Convert back to original representation using exp(x) - C.

127

128

Parameters:

129

- X (pandas.DataFrame): Dataset with log-transformed values

130

131

Returns:

132

- pandas.DataFrame: Dataset with original scale restored

133

"""

134

```

135

136

**Usage Example**:

137

```python

138

from feature_engine.transformation import LogCpTransformer

139

140

# Auto-calculate C (makes minimum value positive)

141

transformer = LogCpTransformer(C='auto')

142

df_transformed = transformer.fit_transform(df)

143

144

# Specify constant C

145

transformer = LogCpTransformer(C=1)

146

df_transformed = transformer.fit_transform(df)

147

148

# Different C per variable

149

transformer = LogCpTransformer(C={'var1': 1, 'var2': 5})

150

df_transformed = transformer.fit_transform(df)

151

152

# Access learned C values

153

print(transformer.C_) # Shows C value per variable

154

```

155

156

### Box-Cox Transformation

157

158

Applies Box-Cox transformation to numerical variables to achieve normality.

159

160

```python { .api }

161

class BoxCoxTransformer:

162

def __init__(self, variables=None):

163

"""

164

Initialize BoxCoxTransformer.

165

166

Parameters:

167

- variables (list): List of numerical variables to transform. If None, selects all numerical variables

168

"""

169

170

def fit(self, X, y=None):

171

"""

172

Learn optimal lambda parameter for Box-Cox transformation per variable.

173

174

Parameters:

175

- X (pandas.DataFrame): Training dataset (must contain positive values)

176

- y (pandas.Series, optional): Target variable (not used)

177

178

Returns:

179

- self

180

"""

181

182

def transform(self, X):

183

"""

184

Apply Box-Cox transformation using learned lambda values.

185

186

Parameters:

187

- X (pandas.DataFrame): Dataset to transform

188

189

Returns:

190

- pandas.DataFrame: Dataset with Box-Cox transformed variables

191

"""

192

193

def fit_transform(self, X, y=None):

194

"""Fit to data, then transform it."""

195

196

def inverse_transform(self, X):

197

"""

198

Convert back to original representation using inverse Box-Cox.

199

200

Parameters:

201

- X (pandas.DataFrame): Dataset with Box-Cox transformed values

202

203

Returns:

204

- pandas.DataFrame: Dataset with original scale restored

205

"""

206

```

207

208

**Usage Example**:

209

```python

210

from feature_engine.transformation import BoxCoxTransformer

211

212

# Box-Cox transformation (requires positive values)

213

transformer = BoxCoxTransformer()

214

df_transformed = transformer.fit_transform(df)

215

216

# Access learned lambda parameters

217

print(transformer.lambda_dict_) # Shows optimal lambda per variable

218

219

# Inverse transformation

220

df_original = transformer.inverse_transform(df_transformed)

221

```

222

223

### Yeo-Johnson Transformation

224

225

Applies Yeo-Johnson transformation to numerical variables, which works with positive and negative values.

226

227

```python { .api }

228

class YeoJohnsonTransformer:

229

def __init__(self, variables=None):

230

"""

231

Initialize YeoJohnsonTransformer.

232

233

Parameters:

234

- variables (list): List of numerical variables to transform. If None, selects all numerical variables

235

"""

236

237

def fit(self, X, y=None):

238

"""

239

Learn optimal lambda parameter for Yeo-Johnson transformation per variable.

240

241

Parameters:

242

- X (pandas.DataFrame): Training dataset

243

- y (pandas.Series, optional): Target variable (not used)

244

245

Returns:

246

- self

247

"""

248

249

def transform(self, X):

250

"""

251

Apply Yeo-Johnson transformation using learned lambda values.

252

253

Parameters:

254

- X (pandas.DataFrame): Dataset to transform

255

256

Returns:

257

- pandas.DataFrame: Dataset with Yeo-Johnson transformed variables

258

"""

259

260

def fit_transform(self, X, y=None):

261

"""Fit to data, then transform it."""

262

263

def inverse_transform(self, X):

264

"""

265

Convert back to original representation using inverse Yeo-Johnson.

266

267

Parameters:

268

- X (pandas.DataFrame): Dataset with Yeo-Johnson transformed values

269

270

Returns:

271

- pandas.DataFrame: Dataset with original scale restored

272

"""

273

```

274

275

**Usage Example**:

276

```python

277

from feature_engine.transformation import YeoJohnsonTransformer

278

279

# Yeo-Johnson transformation (works with positive and negative values)

280

transformer = YeoJohnsonTransformer()

281

df_transformed = transformer.fit_transform(df)

282

283

# Access learned lambda parameters

284

print(transformer.lambda_dict_) # Shows optimal lambda per variable

285

286

# Inverse transformation

287

df_original = transformer.inverse_transform(df_transformed)

288

```

289

290

### Power Transformation

291

292

Applies power transformation (x^lambda) to numerical variables.

293

294

```python { .api }

295

class PowerTransformer:

296

def __init__(self, variables=None, exp=2):

297

"""

298

Initialize PowerTransformer.

299

300

Parameters:

301

- variables (list): List of numerical variables to transform. If None, selects all numerical variables

302

- exp (int/float/list/dict): Exponent for power transformation

303

"""

304

305

def fit(self, X, y=None):

306

"""

307

Validate input data (no parameters learned).

308

309

Parameters:

310

- X (pandas.DataFrame): Training dataset

311

- y (pandas.Series, optional): Target variable (not used)

312

313

Returns:

314

- self

315

"""

316

317

def transform(self, X):

318

"""

319

Apply power transformation to variables.

320

321

Parameters:

322

- X (pandas.DataFrame): Dataset to transform

323

324

Returns:

325

- pandas.DataFrame: Dataset with power-transformed variables

326

"""

327

328

def fit_transform(self, X, y=None):

329

"""Fit to data, then transform it."""

330

331

def inverse_transform(self, X):

332

"""

333

Convert back to original representation using root transformation.

334

335

Parameters:

336

- X (pandas.DataFrame): Dataset with power-transformed values

337

338

Returns:

339

- pandas.DataFrame: Dataset with original scale restored

340

"""

341

```

342

343

**Usage Example**:

344

```python

345

from feature_engine.transformation import PowerTransformer

346

347

# Square transformation (default)

348

transformer = PowerTransformer(exp=2)

349

df_transformed = transformer.fit_transform(df)

350

351

# Square root transformation

352

transformer = PowerTransformer(exp=0.5)

353

df_transformed = transformer.fit_transform(df)

354

355

# Different exponents per variable

356

transformer = PowerTransformer(exp={'var1': 2, 'var2': 3, 'var3': 0.5})

357

df_transformed = transformer.fit_transform(df)

358

359

# Inverse transformation

360

df_original = transformer.inverse_transform(df_transformed)

361

```

362

363

### Reciprocal Transformation

364

365

Applies reciprocal transformation (1/x) to numerical variables.

366

367

```python { .api }

368

class ReciprocalTransformer:

369

def __init__(self, variables=None):

370

"""

371

Initialize ReciprocalTransformer.

372

373

Parameters:

374

- variables (list): List of numerical variables to transform. If None, selects all numerical variables

375

"""

376

377

def fit(self, X, y=None):

378

"""

379

Validate that variables don't contain zeros (no parameters learned).

380

381

Parameters:

382

- X (pandas.DataFrame): Training dataset

383

- y (pandas.Series, optional): Target variable (not used)

384

385

Returns:

386

- self

387

"""

388

389

def transform(self, X):

390

"""

391

Apply reciprocal transformation (1/x) to variables.

392

393

Parameters:

394

- X (pandas.DataFrame): Dataset to transform

395

396

Returns:

397

- pandas.DataFrame: Dataset with reciprocal-transformed variables

398

"""

399

400

def fit_transform(self, X, y=None):

401

"""Fit to data, then transform it."""

402

403

def inverse_transform(self, X):

404

"""

405

Convert back to original representation using reciprocal (1/x).

406

407

Parameters:

408

- X (pandas.DataFrame): Dataset with reciprocal-transformed values

409

410

Returns:

411

- pandas.DataFrame: Dataset with original scale restored

412

"""

413

```

414

415

**Usage Example**:

416

```python

417

from feature_engine.transformation import ReciprocalTransformer

418

419

# Reciprocal transformation (1/x)

420

transformer = ReciprocalTransformer()

421

df_transformed = transformer.fit_transform(df)

422

423

# Inverse transformation (also 1/x)

424

df_original = transformer.inverse_transform(df_transformed)

425

```

426

427

## Usage Patterns

428

429

### Selecting Appropriate Transformations

430

431

```python

432

import matplotlib.pyplot as plt

433

from scipy import stats

434

435

# Assess data distribution before transformation

436

def assess_normality(data, variable):

437

fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 4))

438

439

# Histogram

440

ax1.hist(data[variable], bins=30)

441

ax1.set_title(f'{variable} Distribution')

442

443

# Q-Q plot

444

stats.probplot(data[variable], dist="norm", plot=ax2)

445

ax2.set_title(f'{variable} Q-Q Plot')

446

447

plt.tight_layout()

448

plt.show()

449

450

# Shapiro-Wilk test

451

stat, p_value = stats.shapiro(data[variable].dropna())

452

print(f"Shapiro-Wilk test p-value: {p_value}")

453

454

# Test different transformations

455

from feature_engine.transformation import LogTransformer, BoxCoxTransformer

456

457

transformers = {

458

'log': LogTransformer(),

459

'boxcox': BoxCoxTransformer()

460

}

461

462

for name, transformer in transformers.items():

463

try:

464

df_transformed = transformer.fit_transform(df)

465

print(f"{name} transformation successful")

466

except Exception as e:

467

print(f"{name} transformation failed: {e}")

468

```

469

470

### Pipeline Integration

471

472

```python

473

from sklearn.pipeline import Pipeline

474

from feature_engine.imputation import MeanMedianImputer

475

from feature_engine.transformation import LogCpTransformer

476

from sklearn.preprocessing import StandardScaler

477

478

# Preprocessing pipeline with transformation

479

pipeline = Pipeline([

480

('imputer', MeanMedianImputer()),

481

('transformer', LogCpTransformer(C='auto')),

482

('scaler', StandardScaler())

483

])

484

485

df_processed = pipeline.fit_transform(df)

486

```

487

488

## Common Attributes

489

490

All transformation transformers share these fitted attributes:

491

492

- `variables_` (list): Variables that will be transformed

493

- `n_features_in_` (int): Number of features in training set

494

495

Transformer-specific attributes:

496

- `C_` (dict): Constant C values per variable (LogCpTransformer)

497

- `lambda_dict_` (dict): Lambda parameters per variable (BoxCoxTransformer, YeoJohnsonTransformer)

498

- `exp_` (dict): Exponent values per variable (PowerTransformer)