Tessl Tile for pypi/feature-engine@1.2.0

or run

npx @tessl/cli init

Version

Tile

Overview

Evals

Files

docs

creation.md datetime.md discretisation.md encoding.md imputation.md index.md outliers.md preprocessing.md selection.md transformation.md wrappers.md

wrappers.mddocs/

0
# Scikit-learn Wrappers
1

2
Transformers for applying scikit-learn transformers to specific subsets of variables while maintaining DataFrame structure and column names, enabling seamless integration of scikit-learn functionality within feature-engine workflows.
3

4
## Capabilities
5

6
### Scikit-learn Transformer Wrapper
7

8
Wrapper to apply any Scikit-learn transformer to a selected group of variables while preserving DataFrame structure.
9

10
```python { .api }
11
class SklearnTransformerWrapper:
12
    def __init__(self, transformer, variables=None):
13
        """
14
        Initialize SklearnTransformerWrapper.
15
        
16
        Parameters:
17
        - transformer: Instance of a scikit-learn transformer (must have fit, transform methods)
18
        - variables (list): List of variables to be transformed. If None, transforms all numerical variables
19
        """
20
    
21
    def fit(self, X, y=None):
22
        """
23
        Fit the scikit-learn transformer on selected variables.
24
        
25
        Parameters:
26
        - X (pandas.DataFrame): Training dataset
27
        - y (pandas.Series, optional): Target variable (passed to transformer if needed)
28
        
29
        Returns:
30
        - self
31
        """
32
    
33
    def transform(self, X):
34
        """
35
        Transform data using the fitted scikit-learn transformer.
36
        
37
        Parameters:
38
        - X (pandas.DataFrame): Dataset to transform
39
        
40
        Returns:
41
        - pandas.DataFrame: Dataset with transformed variables, maintaining DataFrame structure
42
        """
43
    
44
    def fit_transform(self, X, y=None):
45
        """Fit to data, then transform it."""
46
    
47
    def inverse_transform(self, X):
48
        """
49
        Inverse transform using the scikit-learn transformer (if supported).
50
        
51
        Parameters:
52
        - X (pandas.DataFrame): Dataset with transformed values
53
        
54
        Returns:
55
        - pandas.DataFrame: Dataset with original scale restored
56
        """
57
```
58

59
**Usage Examples**:
60

61
### Standard Scaling
62
```python
63
from feature_engine.wrappers import SklearnTransformerWrapper
64
from sklearn.preprocessing import StandardScaler
65
import pandas as pd
66
import numpy as np
67

68
# Sample numerical data
69
data = {
70
    'feature1': np.random.normal(100, 20, 1000),
71
    'feature2': np.random.normal(50, 10, 1000),
72
    'feature3': np.random.normal(200, 50, 1000),
73
    'categorical': np.random.choice(['A', 'B', 'C'], 1000)
74
}
75
df = pd.DataFrame(data)
76

77
# Apply StandardScaler to specific numerical variables
78
scaler_wrapper = SklearnTransformerWrapper(
79
    transformer=StandardScaler(),
80
    variables=['feature1', 'feature2']
81
)
82
df_scaled = scaler_wrapper.fit_transform(df)
83

84
# feature3 and categorical remain unchanged
85
# feature1 and feature2 are standardized
86
print(df_scaled.describe())
87
print(df_scaled.dtypes)  # DataFrame structure preserved
88
```
89

90
### Principal Component Analysis
91
```python
92
from sklearn.decomposition import PCA
93

94
# Apply PCA to selected variables
95
pca_wrapper = SklearnTransformerWrapper(
96
    transformer=PCA(n_components=2),
97
    variables=['feature1', 'feature2', 'feature3']
98
)
99
df_pca = pca_wrapper.fit_transform(df)
100

101
# Note: PCA creates new features, original variables are replaced
102
# with principal components (PC1, PC2, etc.)
103
print("PCA explained variance ratio:", 
104
      pca_wrapper.transformer_.explained_variance_ratio_)
105
```
106

107
### Robust Scaling
108
```python
109
from sklearn.preprocessing import RobustScaler
110

111
# Apply RobustScaler (less sensitive to outliers)
112
robust_wrapper = SklearnTransformerWrapper(
113
    transformer=RobustScaler(),
114
    variables=['feature1', 'feature3']
115
)
116
df_robust = robust_wrapper.fit_transform(df)
117

118
# Inverse transformation
119
df_original = robust_wrapper.inverse_transform(df_robust)
120
```
121

122
### Polynomial Features
123
```python
124
from sklearn.preprocessing import PolynomialFeatures
125

126
# Generate polynomial features
127
poly_wrapper = SklearnTransformerWrapper(
128
    transformer=PolynomialFeatures(degree=2, include_bias=False),
129
    variables=['feature1', 'feature2']
130
)
131
df_poly = poly_wrapper.fit_transform(df)
132

133
# Creates additional polynomial combination features
134
print(f"Original features: {len(df.columns)}")
135
print(f"With polynomial features: {len(df_poly.columns)}")
136
```
137

138
### Quantile Transformation
139
```python
140
from sklearn.preprocessing import QuantileTransformer
141

142
# Apply quantile transformation for normalization
143
quantile_wrapper = SklearnTransformerWrapper(
144
    transformer=QuantileTransformer(output_distribution='normal'),
145
    variables=['feature1', 'feature2', 'feature3']
146
)
147
df_quantile = quantile_wrapper.fit_transform(df)
148

149
# Transforms to normal distribution
150
```
151

152
## Advanced Usage Patterns
153

154
### Pipeline Integration
155

156
```python
157
from sklearn.pipeline import Pipeline
158
from feature_engine.imputation import MeanMedianImputer
159
from feature_engine.wrappers import SklearnTransformerWrapper
160
from sklearn.preprocessing import StandardScaler, PolynomialFeatures
161
from sklearn.ensemble import RandomForestClassifier
162

163
# Complex preprocessing pipeline
164
preprocessing_pipeline = Pipeline([
165
    ('imputer', MeanMedianImputer()),
166
    ('polynomial', SklearnTransformerWrapper(
167
        transformer=PolynomialFeatures(degree=2),
168
        variables=['feature1', 'feature2']
169
    )),
170
    ('scaler', SklearnTransformerWrapper(
171
        transformer=StandardScaler(),
172
        variables=None  # Scale all numerical variables
173
    )),
174
    ('classifier', RandomForestClassifier())
175
])
176

177
# Fit and predict
178
preprocessing_pipeline.fit(X_train, y_train)
179
predictions = preprocessing_pipeline.predict(X_test)
180
```
181

182
### Multiple Transformer Application
183

184
```python
185
from sklearn.preprocessing import StandardScaler, MinMaxScaler
186

187
# Apply different scalers to different variable groups
188
standard_scaler_wrapper = SklearnTransformerWrapper(
189
    transformer=StandardScaler(),
190
    variables=['feature1', 'feature2']
191
)
192

193
minmax_scaler_wrapper = SklearnTransformerWrapper(
194
    transformer=MinMaxScaler(),
195
    variables=['feature3']
196
)
197

198
# Sequential application
199
df_multi_scaled = standard_scaler_wrapper.fit_transform(df)
200
df_multi_scaled = minmax_scaler_wrapper.fit_transform(df_multi_scaled)
201
```
202

203
### Custom Scikit-learn Transformer
204

205
```python
206
from sklearn.base import BaseEstimator, TransformerMixin
207
import numpy as np
208

209
# Custom transformer
210
class LogTransformer(BaseEstimator, TransformerMixin):
211
    def fit(self, X, y=None):
212
        return self
213
    
214
    def transform(self, X):
215
        return np.log1p(X)  # log(1 + x)
216
    
217
    def inverse_transform(self, X):
218
        return np.expm1(X)  # exp(x) - 1
219

220
# Use with wrapper
221
log_wrapper = SklearnTransformerWrapper(
222
    transformer=LogTransformer(),
223
    variables=['feature1', 'feature2']
224
)
225
df_log = log_wrapper.fit_transform(df)
226
df_original = log_wrapper.inverse_transform(df_log)
227
```
228

229
### Handling Categorical Variables
230

231
```python
232
from sklearn.preprocessing import LabelEncoder, OrdinalEncoder
233

234
# For categorical variables with sklearn transformers
235
categorical_data = {
236
    'category1': ['A', 'B', 'C', 'A', 'B'],
237
    'category2': ['X', 'Y', 'Z', 'X', 'Y'], 
238
    'numerical': [1, 2, 3, 4, 5]
239
}
240
df_cat = pd.DataFrame(categorical_data)
241

242
# Use OrdinalEncoder for multiple categorical variables
243
ordinal_wrapper = SklearnTransformerWrapper(
244
    transformer=OrdinalEncoder(),
245
    variables=['category1', 'category2']
246
)
247
df_encoded = ordinal_wrapper.fit_transform(df_cat)
248
```
249

250
### Cross-Validation with Wrapper
251

252
```python
253
from sklearn.model_selection import cross_val_score
254
from sklearn.pipeline import Pipeline
255
from sklearn.ensemble import RandomForestRegressor
256

257
# Create pipeline with wrapper
258
pipeline_with_wrapper = Pipeline([
259
    ('scaler', SklearnTransformerWrapper(
260
        transformer=StandardScaler(),
261
        variables=['feature1', 'feature2', 'feature3']
262
    )),
263
    ('regressor', RandomForestRegressor())
264
])
265

266
# Cross-validation
267
cv_scores = cross_val_score(
268
    pipeline_with_wrapper, 
269
    X_train, 
270
    y_train, 
271
    cv=5, 
272
    scoring='neg_mean_squared_error'
273
)
274
print(f"CV RMSE: {np.sqrt(-cv_scores.mean()):.3f}")
275
```
276

277
## Benefits of Using SklearnTransformerWrapper
278

279
### Maintains DataFrame Structure
280
- Preserves column names and indices
281
- Keeps non-transformed columns unchanged
282
- Returns pandas DataFrame instead of numpy array
283

284
### Variable Selection
285
- Apply transformers to specific subsets of variables
286
- Leave categorical or irrelevant variables untouched
287
- Flexible variable selection strategies
288

289
### Pipeline Compatibility
290
- Works seamlessly with feature-engine transformers
291
- Integrates with scikit-learn pipelines
292
- Maintains consistent API across transformers
293

294
### Inverse Transformation Support
295
- Provides inverse transformation when available
296
- Maintains original scale recovery capability
297
- Useful for interpretability and debugging
298

299
## Common Use Cases
300

301
1. **Preprocessing specific variable types**: Apply StandardScaler only to continuous variables
302
2. **Dimensionality reduction**: Use PCA on high-dimensional feature subsets
303
3. **Distribution transformation**: Apply QuantileTransformer to skewed variables
304
4. **Feature generation**: Create polynomial features from selected variables
305
5. **Robust scaling**: Use RobustScaler for variables with outliers
306

307
## Common Attributes
308

309
SklearnTransformerWrapper has these fitted attributes:
310

311
- `transformer_` (sklearn transformer): Fitted scikit-learn transformer instance
312
- `variables_` (list): Variables that were transformed
313
- `n_features_in_` (int): Number of features in training set
314

315
The wrapper provides access to the underlying transformer's attributes through the `transformer_` attribute, enabling access to learned parameters like feature names, explained variance, etc.

Version

Tile

Files

wrappers.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}docs/

wrappers.mddocs/