Tessl Tile for pypi/feature-engine@1.2.0

or run

npx @tessl/cli init

Version

Tile

Overview

Evals

Files

docs

creation.md datetime.md discretisation.md encoding.md imputation.md index.md outliers.md preprocessing.md selection.md transformation.md wrappers.md

transformation.mddocs/

0
# Mathematical Transformations
1

2
Transformers for applying mathematical functions to numerical variables including logarithmic, power, reciprocal, Box-Cox, and Yeo-Johnson transformations to improve data distribution and model performance.
3

4
## Capabilities
5

6
### Logarithmic Transformation
7

8
Applies natural logarithm or base 10 logarithm to numerical variables.
9

10
```python { .api }
11
class LogTransformer:
12
    def __init__(self, variables=None, base='e'):
13
        """
14
        Initialize LogTransformer.
15
        
16
        Parameters:
17
        - variables (list): List of numerical variables to transform. If None, selects all numerical variables
18
        - base (str): 'e' for natural logarithm or '10' for base 10 logarithm
19
        """
20
    
21
    def fit(self, X, y=None):
22
        """
23
        Validate that variables are positive (no parameters learned).
24
        
25
        Parameters:
26
        - X (pandas.DataFrame): Training dataset
27
        - y (pandas.Series, optional): Target variable (not used)
28
        
29
        Returns:
30
        - self
31
        """
32
    
33
    def transform(self, X):
34
        """
35
        Apply logarithm transformation to variables.
36
        
37
        Parameters:
38
        - X (pandas.DataFrame): Dataset to transform
39
        
40
        Returns:
41
        - pandas.DataFrame: Dataset with log-transformed variables
42
        """
43
    
44
    def fit_transform(self, X, y=None):
45
        """Fit to data, then transform it."""
46
    
47
    def inverse_transform(self, X):
48
        """
49
        Convert back to original representation using exponential.
50
        
51
        Parameters:
52
        - X (pandas.DataFrame): Dataset with log-transformed values
53
        
54
        Returns:
55
        - pandas.DataFrame: Dataset with original scale restored
56
        """
57
```
58

59
**Usage Example**:
60
```python
61
from feature_engine.transformation import LogTransformer
62
import pandas as pd
63
import numpy as np
64

65
# Sample data with positive values
66
data = {'price': [100, 200, 500, 1000, 2000],
67
        'volume': [10, 25, 50, 100, 200]}
68
df = pd.DataFrame(data)
69

70
# Natural log transformation
71
transformer = LogTransformer(base='e')
72
df_transformed = transformer.fit_transform(df)
73

74
# Base 10 log transformation
75
transformer = LogTransformer(base='10')
76
df_transformed = transformer.fit_transform(df)
77

78
# Inverse transformation
79
df_original = transformer.inverse_transform(df_transformed)
80
```
81

82
### Log Plus Constant Transformation
83

84
Applies log(x + C) transformation where C is a positive constant, useful for data with zeros or negative values.
85

86
```python { .api }
87
class LogCpTransformer:
88
    def __init__(self, variables=None, base='e', C='auto'):
89
        """
90
        Initialize LogCpTransformer.
91
        
92
        Parameters:
93
        - variables (list): List of numerical variables to transform. If None, selects all numerical variables
94
        - base (str): 'e' for natural logarithm or '10' for base 10 logarithm
95
        - C (int/float/str/dict): Constant to add before log. 'auto' calculates optimal C
96
        """
97
    
98
    def fit(self, X, y=None):
99
        """
100
        Learn constant C if C='auto', otherwise validate input.
101
        
102
        Parameters:
103
        - X (pandas.DataFrame): Training dataset
104
        - y (pandas.Series, optional): Target variable (not used)
105
        
106
        Returns:
107
        - self
108
        """
109
    
110
    def transform(self, X):
111
        """
112
        Apply log(x + C) transformation to variables.
113
        
114
        Parameters:
115
        - X (pandas.DataFrame): Dataset to transform
116
        
117
        Returns:
118
        - pandas.DataFrame: Dataset with log(x + C) transformed variables
119
        """
120
    
121
    def fit_transform(self, X, y=None):
122
        """Fit to data, then transform it."""
123
    
124
    def inverse_transform(self, X):
125
        """
126
        Convert back to original representation using exp(x) - C.
127
        
128
        Parameters:
129
        - X (pandas.DataFrame): Dataset with log-transformed values
130
        
131
        Returns:
132
        - pandas.DataFrame: Dataset with original scale restored
133
        """
134
```
135

136
**Usage Example**:
137
```python
138
from feature_engine.transformation import LogCpTransformer
139

140
# Auto-calculate C (makes minimum value positive)
141
transformer = LogCpTransformer(C='auto')
142
df_transformed = transformer.fit_transform(df)
143

144
# Specify constant C
145
transformer = LogCpTransformer(C=1)
146
df_transformed = transformer.fit_transform(df)
147

148
# Different C per variable
149
transformer = LogCpTransformer(C={'var1': 1, 'var2': 5})
150
df_transformed = transformer.fit_transform(df)
151

152
# Access learned C values
153
print(transformer.C_)  # Shows C value per variable
154
```
155

156
### Box-Cox Transformation
157

158
Applies Box-Cox transformation to numerical variables to achieve normality.
159

160
```python { .api }
161
class BoxCoxTransformer:
162
    def __init__(self, variables=None):
163
        """
164
        Initialize BoxCoxTransformer.
165
        
166
        Parameters:
167
        - variables (list): List of numerical variables to transform. If None, selects all numerical variables
168
        """
169
    
170
    def fit(self, X, y=None):
171
        """
172
        Learn optimal lambda parameter for Box-Cox transformation per variable.
173
        
174
        Parameters:
175
        - X (pandas.DataFrame): Training dataset (must contain positive values)
176
        - y (pandas.Series, optional): Target variable (not used)
177
        
178
        Returns:
179
        - self
180
        """
181
    
182
    def transform(self, X):
183
        """
184
        Apply Box-Cox transformation using learned lambda values.
185
        
186
        Parameters:
187
        - X (pandas.DataFrame): Dataset to transform
188
        
189
        Returns:
190
        - pandas.DataFrame: Dataset with Box-Cox transformed variables
191
        """
192
    
193
    def fit_transform(self, X, y=None):
194
        """Fit to data, then transform it."""
195
    
196
    def inverse_transform(self, X):
197
        """
198
        Convert back to original representation using inverse Box-Cox.
199
        
200
        Parameters:
201
        - X (pandas.DataFrame): Dataset with Box-Cox transformed values
202
        
203
        Returns:
204
        - pandas.DataFrame: Dataset with original scale restored
205
        """
206
```
207

208
**Usage Example**:
209
```python
210
from feature_engine.transformation import BoxCoxTransformer
211

212
# Box-Cox transformation (requires positive values)
213
transformer = BoxCoxTransformer()
214
df_transformed = transformer.fit_transform(df)
215

216
# Access learned lambda parameters
217
print(transformer.lambda_dict_)  # Shows optimal lambda per variable
218

219
# Inverse transformation
220
df_original = transformer.inverse_transform(df_transformed)
221
```
222

223
### Yeo-Johnson Transformation
224

225
Applies Yeo-Johnson transformation to numerical variables, which works with positive and negative values.
226

227
```python { .api }
228
class YeoJohnsonTransformer:
229
    def __init__(self, variables=None):
230
        """
231
        Initialize YeoJohnsonTransformer.
232
        
233
        Parameters:
234
        - variables (list): List of numerical variables to transform. If None, selects all numerical variables
235
        """
236
    
237
    def fit(self, X, y=None):
238
        """
239
        Learn optimal lambda parameter for Yeo-Johnson transformation per variable.
240
        
241
        Parameters:
242
        - X (pandas.DataFrame): Training dataset
243
        - y (pandas.Series, optional): Target variable (not used)
244
        
245
        Returns:
246
        - self
247
        """
248
    
249
    def transform(self, X):
250
        """
251
        Apply Yeo-Johnson transformation using learned lambda values.
252
        
253
        Parameters:
254
        - X (pandas.DataFrame): Dataset to transform
255
        
256
        Returns:
257
        - pandas.DataFrame: Dataset with Yeo-Johnson transformed variables
258
        """
259
    
260
    def fit_transform(self, X, y=None):
261
        """Fit to data, then transform it."""
262
    
263
    def inverse_transform(self, X):
264
        """
265
        Convert back to original representation using inverse Yeo-Johnson.
266
        
267
        Parameters:
268
        - X (pandas.DataFrame): Dataset with Yeo-Johnson transformed values
269
        
270
        Returns:
271
        - pandas.DataFrame: Dataset with original scale restored
272
        """
273
```
274

275
**Usage Example**:
276
```python
277
from feature_engine.transformation import YeoJohnsonTransformer
278

279
# Yeo-Johnson transformation (works with positive and negative values)
280
transformer = YeoJohnsonTransformer()
281
df_transformed = transformer.fit_transform(df)
282

283
# Access learned lambda parameters
284
print(transformer.lambda_dict_)  # Shows optimal lambda per variable
285

286
# Inverse transformation
287
df_original = transformer.inverse_transform(df_transformed)
288
```
289

290
### Power Transformation
291

292
Applies power transformation (x^lambda) to numerical variables.
293

294
```python { .api }
295
class PowerTransformer:
296
    def __init__(self, variables=None, exp=2):
297
        """
298
        Initialize PowerTransformer.
299
        
300
        Parameters:
301
        - variables (list): List of numerical variables to transform. If None, selects all numerical variables
302
        - exp (int/float/list/dict): Exponent for power transformation
303
        """
304
    
305
    def fit(self, X, y=None):
306
        """
307
        Validate input data (no parameters learned).
308
        
309
        Parameters:
310
        - X (pandas.DataFrame): Training dataset
311
        - y (pandas.Series, optional): Target variable (not used)
312
        
313
        Returns:
314
        - self
315
        """
316
    
317
    def transform(self, X):
318
        """
319
        Apply power transformation to variables.
320
        
321
        Parameters:
322
        - X (pandas.DataFrame): Dataset to transform
323
        
324
        Returns:
325
        - pandas.DataFrame: Dataset with power-transformed variables
326
        """
327
    
328
    def fit_transform(self, X, y=None):
329
        """Fit to data, then transform it."""
330
    
331
    def inverse_transform(self, X):
332
        """
333
        Convert back to original representation using root transformation.
334
        
335
        Parameters:
336
        - X (pandas.DataFrame): Dataset with power-transformed values
337
        
338
        Returns:
339
        - pandas.DataFrame: Dataset with original scale restored
340
        """
341
```
342

343
**Usage Example**:
344
```python
345
from feature_engine.transformation import PowerTransformer
346

347
# Square transformation (default)
348
transformer = PowerTransformer(exp=2)
349
df_transformed = transformer.fit_transform(df)
350

351
# Square root transformation
352
transformer = PowerTransformer(exp=0.5)
353
df_transformed = transformer.fit_transform(df)
354

355
# Different exponents per variable
356
transformer = PowerTransformer(exp={'var1': 2, 'var2': 3, 'var3': 0.5})
357
df_transformed = transformer.fit_transform(df)
358

359
# Inverse transformation
360
df_original = transformer.inverse_transform(df_transformed)
361
```
362

363
### Reciprocal Transformation
364

365
Applies reciprocal transformation (1/x) to numerical variables.
366

367
```python { .api }
368
class ReciprocalTransformer:
369
    def __init__(self, variables=None):
370
        """
371
        Initialize ReciprocalTransformer.
372
        
373
        Parameters:
374
        - variables (list): List of numerical variables to transform. If None, selects all numerical variables
375
        """
376
    
377
    def fit(self, X, y=None):
378
        """
379
        Validate that variables don't contain zeros (no parameters learned).
380
        
381
        Parameters:
382
        - X (pandas.DataFrame): Training dataset
383
        - y (pandas.Series, optional): Target variable (not used)
384
        
385
        Returns:
386
        - self
387
        """
388
    
389
    def transform(self, X):
390
        """
391
        Apply reciprocal transformation (1/x) to variables.
392
        
393
        Parameters:
394
        - X (pandas.DataFrame): Dataset to transform
395
        
396
        Returns:
397
        - pandas.DataFrame: Dataset with reciprocal-transformed variables
398
        """
399
    
400
    def fit_transform(self, X, y=None):
401
        """Fit to data, then transform it."""
402
    
403
    def inverse_transform(self, X):
404
        """
405
        Convert back to original representation using reciprocal (1/x).
406
        
407
        Parameters:
408
        - X (pandas.DataFrame): Dataset with reciprocal-transformed values
409
        
410
        Returns:
411
        - pandas.DataFrame: Dataset with original scale restored
412
        """
413
```
414

415
**Usage Example**:
416
```python
417
from feature_engine.transformation import ReciprocalTransformer
418

419
# Reciprocal transformation (1/x)
420
transformer = ReciprocalTransformer()
421
df_transformed = transformer.fit_transform(df)
422

423
# Inverse transformation (also 1/x)
424
df_original = transformer.inverse_transform(df_transformed)
425
```
426

427
## Usage Patterns
428

429
### Selecting Appropriate Transformations
430

431
```python
432
import matplotlib.pyplot as plt
433
from scipy import stats
434

435
# Assess data distribution before transformation
436
def assess_normality(data, variable):
437
    fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 4))
438
    
439
    # Histogram
440
    ax1.hist(data[variable], bins=30)
441
    ax1.set_title(f'{variable} Distribution')
442
    
443
    # Q-Q plot
444
    stats.probplot(data[variable], dist="norm", plot=ax2)
445
    ax2.set_title(f'{variable} Q-Q Plot')
446
    
447
    plt.tight_layout()
448
    plt.show()
449
    
450
    # Shapiro-Wilk test
451
    stat, p_value = stats.shapiro(data[variable].dropna())
452
    print(f"Shapiro-Wilk test p-value: {p_value}")
453

454
# Test different transformations
455
from feature_engine.transformation import LogTransformer, BoxCoxTransformer
456

457
transformers = {
458
    'log': LogTransformer(),
459
    'boxcox': BoxCoxTransformer()
460
}
461

462
for name, transformer in transformers.items():
463
    try:
464
        df_transformed = transformer.fit_transform(df)
465
        print(f"{name} transformation successful")
466
    except Exception as e:
467
        print(f"{name} transformation failed: {e}")
468
```
469

470
### Pipeline Integration
471

472
```python
473
from sklearn.pipeline import Pipeline
474
from feature_engine.imputation import MeanMedianImputer
475
from feature_engine.transformation import LogCpTransformer
476
from sklearn.preprocessing import StandardScaler
477

478
# Preprocessing pipeline with transformation
479
pipeline = Pipeline([
480
    ('imputer', MeanMedianImputer()),
481
    ('transformer', LogCpTransformer(C='auto')),
482
    ('scaler', StandardScaler())
483
])
484

485
df_processed = pipeline.fit_transform(df)
486
```
487

488
## Common Attributes
489

490
All transformation transformers share these fitted attributes:
491

492
- `variables_` (list): Variables that will be transformed
493
- `n_features_in_` (int): Number of features in training set
494

495
Transformer-specific attributes:
496
- `C_` (dict): Constant C values per variable (LogCpTransformer)
497
- `lambda_dict_` (dict): Lambda parameters per variable (BoxCoxTransformer, YeoJohnsonTransformer)
498
- `exp_` (dict): Exponent values per variable (PowerTransformer)

Version

Tile

Files

transformation.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}docs/

transformation.mddocs/