Python library with 44+ transformers for feature engineering and selection following scikit-learn API
npx @tessl/cli install tessl/pypi-feature-engine@1.2.00
# Feature-Engine
1
2
A Python library with multiple transformers to engineer and select features for machine learning. All transformers follow the scikit-learn API pattern, enabling seamless integration with existing machine learning pipelines.
3
4
## Package Information
5
6
- **Package Name**: feature-engine
7
- **Package Type**: library
8
- **Language**: Python
9
- **Installation**: `pip install feature-engine`
10
11
## Core Imports
12
13
```python
14
import feature_engine
15
```
16
17
Common import patterns for specific modules:
18
19
```python
20
from feature_engine.imputation import MeanMedianImputer, CategoricalImputer
21
from feature_engine.encoding import OneHotEncoder, OrdinalEncoder
22
from feature_engine.transformation import LogTransformer, BoxCoxTransformer
23
from feature_engine.selection import DropFeatures, DropConstantFeatures, DropHighPSIFeatures, SelectByTargetMeanPerformance
24
from feature_engine.outliers import Winsorizer
25
```
26
27
## Basic Usage
28
29
```python
30
import pandas as pd
31
from feature_engine.imputation import MeanMedianImputer
32
from feature_engine.encoding import OrdinalEncoder
33
from sklearn.pipeline import Pipeline
34
from sklearn.ensemble import RandomForestClassifier
35
36
# Create sample data
37
data = {
38
'numeric_var1': [1.0, 2.0, None, 4.0, 5.0],
39
'numeric_var2': [10, 20, 30, None, 50],
40
'categorical_var': ['A', 'B', 'A', 'C', 'B']
41
}
42
df = pd.DataFrame(data)
43
y = [0, 1, 0, 1, 0]
44
45
# Create transformers
46
imputer = MeanMedianImputer(imputation_method='median')
47
encoder = OrdinalEncoder(encoding_method='arbitrary')
48
49
# Fit and transform data
50
X_imputed = imputer.fit_transform(df)
51
X_encoded = encoder.fit_transform(X_imputed)
52
53
# Or use in pipeline
54
pipeline = Pipeline([
55
('imputer', MeanMedianImputer()),
56
('encoder', OrdinalEncoder(encoding_method='arbitrary')),
57
('classifier', RandomForestClassifier())
58
])
59
60
pipeline.fit(df, y)
61
predictions = pipeline.predict(df)
62
```
63
64
## Architecture
65
66
Feature-Engine follows the scikit-learn API design pattern with consistent interfaces across all transformers:
67
68
- **fit(X, y=None)**: Learn transformation parameters from training data
69
- **transform(X)**: Apply learned transformation to new data
70
- **fit_transform(X, y=None)**: Combine fit and transform operations
71
- **inverse_transform(X)**: Reverse transformation (where applicable)
72
73
All transformers inherit from base classes that provide:
74
- Automatic variable selection (numerical or categorical)
75
- Input validation and type checking
76
- Consistent parameter storage in attributes ending with `_`
77
- Integration with pandas DataFrames
78
79
## Capabilities
80
81
### Missing Data Imputation
82
83
Handle missing values in numerical and categorical variables using statistical methods, arbitrary values, or advanced techniques like random sampling.
84
85
```python { .api }
86
class MeanMedianImputer:
87
def __init__(self, imputation_method='median', variables=None): ...
88
def fit(self, X, y=None): ...
89
def transform(self, X): ...
90
91
class CategoricalImputer:
92
def __init__(self, imputation_method='missing', fill_value='Missing', variables=None): ...
93
def fit(self, X, y=None): ...
94
def transform(self, X): ...
95
96
class ArbitraryNumberImputer:
97
def __init__(self, arbitrary_number=999, variables=None): ...
98
def fit(self, X, y=None): ...
99
def transform(self, X): ...
100
```
101
102
[Missing Data Imputation](./imputation.md)
103
104
### Categorical Variable Encoding
105
106
Transform categorical variables into numerical representations using various encoding methods including one-hot, ordinal, target-based, and frequency-based encoders.
107
108
```python { .api }
109
class OneHotEncoder:
110
def __init__(self, top_categories=None, drop_last=False, variables=None): ...
111
def fit(self, X, y=None): ...
112
def transform(self, X): ...
113
114
class OrdinalEncoder:
115
def __init__(self, encoding_method='ordered', variables=None): ...
116
def fit(self, X, y=None): ...
117
def transform(self, X): ...
118
119
class MeanEncoder:
120
def __init__(self, variables=None, ignore_format=False): ...
121
def fit(self, X, y): ...
122
def transform(self, X): ...
123
```
124
125
[Categorical Variable Encoding](./encoding.md)
126
127
### Variable Discretisation
128
129
Convert continuous variables into discrete intervals using equal width, equal frequency, decision tree-based, or user-defined boundaries.
130
131
```python { .api }
132
class EqualWidthDiscretiser:
133
def __init__(self, variables=None, return_object=False, return_boundaries=False): ...
134
def fit(self, X, y=None): ...
135
def transform(self, X): ...
136
137
class EqualFrequencyDiscretiser:
138
def __init__(self, variables=None, return_object=False, return_boundaries=False): ...
139
def fit(self, X, y=None): ...
140
def transform(self, X): ...
141
142
class ArbitraryDiscretiser:
143
def __init__(self, binning_dict, return_object=False, return_boundaries=False): ...
144
def fit(self, X, y=None): ...
145
def transform(self, X): ...
146
```
147
148
[Variable Discretisation](./discretisation.md)
149
150
### Mathematical Transformations
151
152
Apply mathematical functions to numerical variables including logarithmic, power, reciprocal, Box-Cox, and Yeo-Johnson transformations.
153
154
```python { .api }
155
class LogTransformer:
156
def __init__(self, variables=None, base='e'): ...
157
def fit(self, X, y=None): ...
158
def transform(self, X): ...
159
def inverse_transform(self, X): ...
160
161
class BoxCoxTransformer:
162
def __init__(self, variables=None): ...
163
def fit(self, X, y=None): ...
164
def transform(self, X): ...
165
def inverse_transform(self, X): ...
166
167
class PowerTransformer:
168
def __init__(self, variables=None, exp=2): ...
169
def fit(self, X, y=None): ...
170
def transform(self, X): ...
171
```
172
173
[Mathematical Transformations](./transformation.md)
174
175
### Feature Selection
176
177
Remove or select features based on various criteria including variance, correlation, performance metrics, and statistical tests.
178
179
```python { .api }
180
class DropFeatures:
181
def __init__(self, features_to_drop): ...
182
def fit(self, X, y=None): ...
183
def transform(self, X): ...
184
185
class DropConstantFeatures:
186
def __init__(self, variables=None, tol=1, missing_values='raise'): ...
187
def fit(self, X, y=None): ...
188
def transform(self, X): ...
189
190
class DropCorrelatedFeatures:
191
def __init__(self, variables=None, method='pearson', threshold=0.8): ...
192
def fit(self, X, y=None): ...
193
def transform(self, X): ...
194
```
195
196
[Feature Selection](./selection.md)
197
198
### Outlier Detection and Handling
199
200
Identify and handle outliers using statistical methods including Winsorization, capping, and trimming techniques.
201
202
```python { .api }
203
class Winsorizer:
204
def __init__(self, capping_method='gaussian', tail='right', fold=3, variables=None): ...
205
def fit(self, X, y=None): ...
206
def transform(self, X): ...
207
208
class ArbitraryOutlierCapper:
209
def __init__(self, max_capping_dict=None, min_capping_dict=None, variables=None): ...
210
def fit(self, X, y=None): ...
211
def transform(self, X): ...
212
213
class OutlierTrimmer:
214
def __init__(self, capping_method='gaussian', tail='right', fold=3, variables=None): ...
215
def fit(self, X, y=None): ...
216
def transform(self, X): ...
217
```
218
219
[Outlier Detection and Handling](./outliers.md)
220
221
### Feature Creation
222
223
Generate new features through mathematical combinations, cyclical transformations, and reference feature combinations.
224
225
```python { .api }
226
class MathematicalCombination:
227
def __init__(self, variables_to_combine, math_operations=None, new_variables_names=None): ...
228
def fit(self, X, y=None): ...
229
def transform(self, X): ...
230
231
class CyclicalTransformer:
232
def __init__(self, variables=None, max_values=None, drop_original=False): ...
233
def fit(self, X, y=None): ...
234
def transform(self, X): ...
235
236
class CombineWithReferenceFeature:
237
def __init__(self, variables_to_combine, reference_variables, operations_list): ...
238
def fit(self, X, y=None): ...
239
def transform(self, X): ...
240
```
241
242
[Feature Creation](./creation.md)
243
244
### Datetime Feature Extraction
245
246
Extract meaningful features from datetime variables including time components, periods, and date-related boolean flags.
247
248
```python { .api }
249
class DatetimeFeatures:
250
def __init__(self, variables=None, features_to_extract=None, drop_original=True): ...
251
def fit(self, X, y=None): ...
252
def transform(self, X): ...
253
```
254
255
[Datetime Feature Extraction](./datetime.md)
256
257
### Scikit-learn Wrappers
258
259
Apply scikit-learn transformers to specific subsets of variables while maintaining DataFrame structure and column names.
260
261
```python { .api }
262
class SklearnTransformerWrapper:
263
def __init__(self, transformer, variables=None): ...
264
def fit(self, X, y=None): ...
265
def transform(self, X): ...
266
def fit_transform(self, X, y=None): ...
267
```
268
269
[Scikit-learn Wrappers](./wrappers.md)
270
271
### Preprocessing Utilities
272
273
General preprocessing functions for data preparation and variable matching between datasets.
274
275
```python { .api }
276
class MatchVariables:
277
def __init__(self, missing_values='raise'): ...
278
def fit(self, X, y=None): ...
279
def transform(self, X): ...
280
```
281
282
[Preprocessing Utilities](./preprocessing.md)