Tessl Tile for pypi/feature-engine@1.2.0

or run

npx @tessl/cli init

Version

Tile

Overview

Evals

Files

docs

creation.md datetime.md discretisation.md encoding.md imputation.md index.md outliers.md preprocessing.md selection.md transformation.md wrappers.md

index.mddocs/

0
# Feature-Engine
1

2
A Python library with multiple transformers to engineer and select features for machine learning. All transformers follow the scikit-learn API pattern, enabling seamless integration with existing machine learning pipelines.
3

4
## Package Information
5

6
- **Package Name**: feature-engine
7
- **Package Type**: library
8
- **Language**: Python
9
- **Installation**: `pip install feature-engine`
10

11
## Core Imports
12

13
```python
14
import feature_engine
15
```
16

17
Common import patterns for specific modules:
18

19
```python
20
from feature_engine.imputation import MeanMedianImputer, CategoricalImputer
21
from feature_engine.encoding import OneHotEncoder, OrdinalEncoder
22
from feature_engine.transformation import LogTransformer, BoxCoxTransformer
23
from feature_engine.selection import DropFeatures, DropConstantFeatures, DropHighPSIFeatures, SelectByTargetMeanPerformance
24
from feature_engine.outliers import Winsorizer
25
```
26

27
## Basic Usage
28

29
```python
30
import pandas as pd
31
from feature_engine.imputation import MeanMedianImputer
32
from feature_engine.encoding import OrdinalEncoder
33
from sklearn.pipeline import Pipeline
34
from sklearn.ensemble import RandomForestClassifier
35

36
# Create sample data
37
data = {
38
    'numeric_var1': [1.0, 2.0, None, 4.0, 5.0],
39
    'numeric_var2': [10, 20, 30, None, 50],
40
    'categorical_var': ['A', 'B', 'A', 'C', 'B']
41
}
42
df = pd.DataFrame(data)
43
y = [0, 1, 0, 1, 0]
44

45
# Create transformers
46
imputer = MeanMedianImputer(imputation_method='median')
47
encoder = OrdinalEncoder(encoding_method='arbitrary')
48

49
# Fit and transform data
50
X_imputed = imputer.fit_transform(df)
51
X_encoded = encoder.fit_transform(X_imputed)
52

53
# Or use in pipeline
54
pipeline = Pipeline([
55
    ('imputer', MeanMedianImputer()),
56
    ('encoder', OrdinalEncoder(encoding_method='arbitrary')),
57
    ('classifier', RandomForestClassifier())
58
])
59

60
pipeline.fit(df, y)
61
predictions = pipeline.predict(df)
62
```
63

64
## Architecture
65

66
Feature-Engine follows the scikit-learn API design pattern with consistent interfaces across all transformers:
67

68
- **fit(X, y=None)**: Learn transformation parameters from training data
69
- **transform(X)**: Apply learned transformation to new data
70
- **fit_transform(X, y=None)**: Combine fit and transform operations
71
- **inverse_transform(X)**: Reverse transformation (where applicable)
72

73
All transformers inherit from base classes that provide:
74
- Automatic variable selection (numerical or categorical)
75
- Input validation and type checking
76
- Consistent parameter storage in attributes ending with `_`
77
- Integration with pandas DataFrames
78

79
## Capabilities
80

81
### Missing Data Imputation
82

83
Handle missing values in numerical and categorical variables using statistical methods, arbitrary values, or advanced techniques like random sampling.
84

85
```python { .api }
86
class MeanMedianImputer:
87
    def __init__(self, imputation_method='median', variables=None): ...
88
    def fit(self, X, y=None): ...
89
    def transform(self, X): ...
90

91
class CategoricalImputer:
92
    def __init__(self, imputation_method='missing', fill_value='Missing', variables=None): ...
93
    def fit(self, X, y=None): ...
94
    def transform(self, X): ...
95

96
class ArbitraryNumberImputer:
97
    def __init__(self, arbitrary_number=999, variables=None): ...
98
    def fit(self, X, y=None): ...
99
    def transform(self, X): ...
100
```
101

102
[Missing Data Imputation](./imputation.md)
103

104
### Categorical Variable Encoding
105

106
Transform categorical variables into numerical representations using various encoding methods including one-hot, ordinal, target-based, and frequency-based encoders.
107

108
```python { .api }
109
class OneHotEncoder:
110
    def __init__(self, top_categories=None, drop_last=False, variables=None): ...
111
    def fit(self, X, y=None): ...
112
    def transform(self, X): ...
113

114
class OrdinalEncoder:
115
    def __init__(self, encoding_method='ordered', variables=None): ...
116
    def fit(self, X, y=None): ...
117
    def transform(self, X): ...
118

119
class MeanEncoder:
120
    def __init__(self, variables=None, ignore_format=False): ...
121
    def fit(self, X, y): ...
122
    def transform(self, X): ...
123
```
124

125
[Categorical Variable Encoding](./encoding.md)
126

127
### Variable Discretisation
128

129
Convert continuous variables into discrete intervals using equal width, equal frequency, decision tree-based, or user-defined boundaries.
130

131
```python { .api }
132
class EqualWidthDiscretiser:
133
    def __init__(self, variables=None, return_object=False, return_boundaries=False): ...
134
    def fit(self, X, y=None): ...
135
    def transform(self, X): ...
136

137
class EqualFrequencyDiscretiser:
138
    def __init__(self, variables=None, return_object=False, return_boundaries=False): ...
139
    def fit(self, X, y=None): ...
140
    def transform(self, X): ...
141

142
class ArbitraryDiscretiser:
143
    def __init__(self, binning_dict, return_object=False, return_boundaries=False): ...
144
    def fit(self, X, y=None): ...
145
    def transform(self, X): ...
146
```
147

148
[Variable Discretisation](./discretisation.md)
149

150
### Mathematical Transformations
151

152
Apply mathematical functions to numerical variables including logarithmic, power, reciprocal, Box-Cox, and Yeo-Johnson transformations.
153

154
```python { .api }
155
class LogTransformer:
156
    def __init__(self, variables=None, base='e'): ...
157
    def fit(self, X, y=None): ...
158
    def transform(self, X): ...
159
    def inverse_transform(self, X): ...
160

161
class BoxCoxTransformer:
162
    def __init__(self, variables=None): ...
163
    def fit(self, X, y=None): ...
164
    def transform(self, X): ...
165
    def inverse_transform(self, X): ...
166

167
class PowerTransformer:
168
    def __init__(self, variables=None, exp=2): ...
169
    def fit(self, X, y=None): ...
170
    def transform(self, X): ...
171
```
172

173
[Mathematical Transformations](./transformation.md)
174

175
### Feature Selection
176

177
Remove or select features based on various criteria including variance, correlation, performance metrics, and statistical tests.
178

179
```python { .api }
180
class DropFeatures:
181
    def __init__(self, features_to_drop): ...
182
    def fit(self, X, y=None): ...
183
    def transform(self, X): ...
184

185
class DropConstantFeatures:
186
    def __init__(self, variables=None, tol=1, missing_values='raise'): ...
187
    def fit(self, X, y=None): ...
188
    def transform(self, X): ...
189

190
class DropCorrelatedFeatures:
191
    def __init__(self, variables=None, method='pearson', threshold=0.8): ...
192
    def fit(self, X, y=None): ...
193
    def transform(self, X): ...
194
```
195

196
[Feature Selection](./selection.md)
197

198
### Outlier Detection and Handling
199

200
Identify and handle outliers using statistical methods including Winsorization, capping, and trimming techniques.
201

202
```python { .api }
203
class Winsorizer:
204
    def __init__(self, capping_method='gaussian', tail='right', fold=3, variables=None): ...
205
    def fit(self, X, y=None): ...
206
    def transform(self, X): ...
207

208
class ArbitraryOutlierCapper:
209
    def __init__(self, max_capping_dict=None, min_capping_dict=None, variables=None): ...
210
    def fit(self, X, y=None): ...
211
    def transform(self, X): ...
212

213
class OutlierTrimmer:
214
    def __init__(self, capping_method='gaussian', tail='right', fold=3, variables=None): ...
215
    def fit(self, X, y=None): ...
216
    def transform(self, X): ...
217
```
218

219
[Outlier Detection and Handling](./outliers.md)
220

221
### Feature Creation
222

223
Generate new features through mathematical combinations, cyclical transformations, and reference feature combinations.
224

225
```python { .api }
226
class MathematicalCombination:
227
    def __init__(self, variables_to_combine, math_operations=None, new_variables_names=None): ...
228
    def fit(self, X, y=None): ...
229
    def transform(self, X): ...
230

231
class CyclicalTransformer:
232
    def __init__(self, variables=None, max_values=None, drop_original=False): ...
233
    def fit(self, X, y=None): ...
234
    def transform(self, X): ...
235

236
class CombineWithReferenceFeature:
237
    def __init__(self, variables_to_combine, reference_variables, operations_list): ...
238
    def fit(self, X, y=None): ...
239
    def transform(self, X): ...
240
```
241

242
[Feature Creation](./creation.md)
243

244
### Datetime Feature Extraction
245

246
Extract meaningful features from datetime variables including time components, periods, and date-related boolean flags.
247

248
```python { .api }
249
class DatetimeFeatures:
250
    def __init__(self, variables=None, features_to_extract=None, drop_original=True): ...
251
    def fit(self, X, y=None): ...
252
    def transform(self, X): ...
253
```
254

255
[Datetime Feature Extraction](./datetime.md)
256

257
### Scikit-learn Wrappers
258

259
Apply scikit-learn transformers to specific subsets of variables while maintaining DataFrame structure and column names.
260

261
```python { .api }
262
class SklearnTransformerWrapper:
263
    def __init__(self, transformer, variables=None): ...
264
    def fit(self, X, y=None): ...
265
    def transform(self, X): ...
266
    def fit_transform(self, X, y=None): ...
267
```
268

269
[Scikit-learn Wrappers](./wrappers.md)
270

271
### Preprocessing Utilities
272

273
General preprocessing functions for data preparation and variable matching between datasets.
274

275
```python { .api }
276
class MatchVariables:
277
    def __init__(self, missing_values='raise'): ...
278
    def fit(self, X, y=None): ...
279
    def transform(self, X): ...
280
```
281

282
[Preprocessing Utilities](./preprocessing.md)

Version

Tile

Files

index.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}docs/

index.mddocs/