or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

creation.mddatetime.mddiscretisation.mdencoding.mdimputation.mdindex.mdoutliers.mdpreprocessing.mdselection.mdtransformation.mdwrappers.md

wrappers.mddocs/

0

# Scikit-learn Wrappers

1

2

Transformers for applying scikit-learn transformers to specific subsets of variables while maintaining DataFrame structure and column names, enabling seamless integration of scikit-learn functionality within feature-engine workflows.

3

4

## Capabilities

5

6

### Scikit-learn Transformer Wrapper

7

8

Wrapper to apply any Scikit-learn transformer to a selected group of variables while preserving DataFrame structure.

9

10

```python { .api }

11

class SklearnTransformerWrapper:

12

def __init__(self, transformer, variables=None):

13

"""

14

Initialize SklearnTransformerWrapper.

15

16

Parameters:

17

- transformer: Instance of a scikit-learn transformer (must have fit, transform methods)

18

- variables (list): List of variables to be transformed. If None, transforms all numerical variables

19

"""

20

21

def fit(self, X, y=None):

22

"""

23

Fit the scikit-learn transformer on selected variables.

24

25

Parameters:

26

- X (pandas.DataFrame): Training dataset

27

- y (pandas.Series, optional): Target variable (passed to transformer if needed)

28

29

Returns:

30

- self

31

"""

32

33

def transform(self, X):

34

"""

35

Transform data using the fitted scikit-learn transformer.

36

37

Parameters:

38

- X (pandas.DataFrame): Dataset to transform

39

40

Returns:

41

- pandas.DataFrame: Dataset with transformed variables, maintaining DataFrame structure

42

"""

43

44

def fit_transform(self, X, y=None):

45

"""Fit to data, then transform it."""

46

47

def inverse_transform(self, X):

48

"""

49

Inverse transform using the scikit-learn transformer (if supported).

50

51

Parameters:

52

- X (pandas.DataFrame): Dataset with transformed values

53

54

Returns:

55

- pandas.DataFrame: Dataset with original scale restored

56

"""

57

```

58

59

**Usage Examples**:

60

61

### Standard Scaling

62

```python

63

from feature_engine.wrappers import SklearnTransformerWrapper

64

from sklearn.preprocessing import StandardScaler

65

import pandas as pd

66

import numpy as np

67

68

# Sample numerical data

69

data = {

70

'feature1': np.random.normal(100, 20, 1000),

71

'feature2': np.random.normal(50, 10, 1000),

72

'feature3': np.random.normal(200, 50, 1000),

73

'categorical': np.random.choice(['A', 'B', 'C'], 1000)

74

}

75

df = pd.DataFrame(data)

76

77

# Apply StandardScaler to specific numerical variables

78

scaler_wrapper = SklearnTransformerWrapper(

79

transformer=StandardScaler(),

80

variables=['feature1', 'feature2']

81

)

82

df_scaled = scaler_wrapper.fit_transform(df)

83

84

# feature3 and categorical remain unchanged

85

# feature1 and feature2 are standardized

86

print(df_scaled.describe())

87

print(df_scaled.dtypes) # DataFrame structure preserved

88

```

89

90

### Principal Component Analysis

91

```python

92

from sklearn.decomposition import PCA

93

94

# Apply PCA to selected variables

95

pca_wrapper = SklearnTransformerWrapper(

96

transformer=PCA(n_components=2),

97

variables=['feature1', 'feature2', 'feature3']

98

)

99

df_pca = pca_wrapper.fit_transform(df)

100

101

# Note: PCA creates new features, original variables are replaced

102

# with principal components (PC1, PC2, etc.)

103

print("PCA explained variance ratio:",

104

pca_wrapper.transformer_.explained_variance_ratio_)

105

```

106

107

### Robust Scaling

108

```python

109

from sklearn.preprocessing import RobustScaler

110

111

# Apply RobustScaler (less sensitive to outliers)

112

robust_wrapper = SklearnTransformerWrapper(

113

transformer=RobustScaler(),

114

variables=['feature1', 'feature3']

115

)

116

df_robust = robust_wrapper.fit_transform(df)

117

118

# Inverse transformation

119

df_original = robust_wrapper.inverse_transform(df_robust)

120

```

121

122

### Polynomial Features

123

```python

124

from sklearn.preprocessing import PolynomialFeatures

125

126

# Generate polynomial features

127

poly_wrapper = SklearnTransformerWrapper(

128

transformer=PolynomialFeatures(degree=2, include_bias=False),

129

variables=['feature1', 'feature2']

130

)

131

df_poly = poly_wrapper.fit_transform(df)

132

133

# Creates additional polynomial combination features

134

print(f"Original features: {len(df.columns)}")

135

print(f"With polynomial features: {len(df_poly.columns)}")

136

```

137

138

### Quantile Transformation

139

```python

140

from sklearn.preprocessing import QuantileTransformer

141

142

# Apply quantile transformation for normalization

143

quantile_wrapper = SklearnTransformerWrapper(

144

transformer=QuantileTransformer(output_distribution='normal'),

145

variables=['feature1', 'feature2', 'feature3']

146

)

147

df_quantile = quantile_wrapper.fit_transform(df)

148

149

# Transforms to normal distribution

150

```

151

152

## Advanced Usage Patterns

153

154

### Pipeline Integration

155

156

```python

157

from sklearn.pipeline import Pipeline

158

from feature_engine.imputation import MeanMedianImputer

159

from feature_engine.wrappers import SklearnTransformerWrapper

160

from sklearn.preprocessing import StandardScaler, PolynomialFeatures

161

from sklearn.ensemble import RandomForestClassifier

162

163

# Complex preprocessing pipeline

164

preprocessing_pipeline = Pipeline([

165

('imputer', MeanMedianImputer()),

166

('polynomial', SklearnTransformerWrapper(

167

transformer=PolynomialFeatures(degree=2),

168

variables=['feature1', 'feature2']

169

)),

170

('scaler', SklearnTransformerWrapper(

171

transformer=StandardScaler(),

172

variables=None # Scale all numerical variables

173

)),

174

('classifier', RandomForestClassifier())

175

])

176

177

# Fit and predict

178

preprocessing_pipeline.fit(X_train, y_train)

179

predictions = preprocessing_pipeline.predict(X_test)

180

```

181

182

### Multiple Transformer Application

183

184

```python

185

from sklearn.preprocessing import StandardScaler, MinMaxScaler

186

187

# Apply different scalers to different variable groups

188

standard_scaler_wrapper = SklearnTransformerWrapper(

189

transformer=StandardScaler(),

190

variables=['feature1', 'feature2']

191

)

192

193

minmax_scaler_wrapper = SklearnTransformerWrapper(

194

transformer=MinMaxScaler(),

195

variables=['feature3']

196

)

197

198

# Sequential application

199

df_multi_scaled = standard_scaler_wrapper.fit_transform(df)

200

df_multi_scaled = minmax_scaler_wrapper.fit_transform(df_multi_scaled)

201

```

202

203

### Custom Scikit-learn Transformer

204

205

```python

206

from sklearn.base import BaseEstimator, TransformerMixin

207

import numpy as np

208

209

# Custom transformer

210

class LogTransformer(BaseEstimator, TransformerMixin):

211

def fit(self, X, y=None):

212

return self

213

214

def transform(self, X):

215

return np.log1p(X) # log(1 + x)

216

217

def inverse_transform(self, X):

218

return np.expm1(X) # exp(x) - 1

219

220

# Use with wrapper

221

log_wrapper = SklearnTransformerWrapper(

222

transformer=LogTransformer(),

223

variables=['feature1', 'feature2']

224

)

225

df_log = log_wrapper.fit_transform(df)

226

df_original = log_wrapper.inverse_transform(df_log)

227

```

228

229

### Handling Categorical Variables

230

231

```python

232

from sklearn.preprocessing import LabelEncoder, OrdinalEncoder

233

234

# For categorical variables with sklearn transformers

235

categorical_data = {

236

'category1': ['A', 'B', 'C', 'A', 'B'],

237

'category2': ['X', 'Y', 'Z', 'X', 'Y'],

238

'numerical': [1, 2, 3, 4, 5]

239

}

240

df_cat = pd.DataFrame(categorical_data)

241

242

# Use OrdinalEncoder for multiple categorical variables

243

ordinal_wrapper = SklearnTransformerWrapper(

244

transformer=OrdinalEncoder(),

245

variables=['category1', 'category2']

246

)

247

df_encoded = ordinal_wrapper.fit_transform(df_cat)

248

```

249

250

### Cross-Validation with Wrapper

251

252

```python

253

from sklearn.model_selection import cross_val_score

254

from sklearn.pipeline import Pipeline

255

from sklearn.ensemble import RandomForestRegressor

256

257

# Create pipeline with wrapper

258

pipeline_with_wrapper = Pipeline([

259

('scaler', SklearnTransformerWrapper(

260

transformer=StandardScaler(),

261

variables=['feature1', 'feature2', 'feature3']

262

)),

263

('regressor', RandomForestRegressor())

264

])

265

266

# Cross-validation

267

cv_scores = cross_val_score(

268

pipeline_with_wrapper,

269

X_train,

270

y_train,

271

cv=5,

272

scoring='neg_mean_squared_error'

273

)

274

print(f"CV RMSE: {np.sqrt(-cv_scores.mean()):.3f}")

275

```

276

277

## Benefits of Using SklearnTransformerWrapper

278

279

### Maintains DataFrame Structure

280

- Preserves column names and indices

281

- Keeps non-transformed columns unchanged

282

- Returns pandas DataFrame instead of numpy array

283

284

### Variable Selection

285

- Apply transformers to specific subsets of variables

286

- Leave categorical or irrelevant variables untouched

287

- Flexible variable selection strategies

288

289

### Pipeline Compatibility

290

- Works seamlessly with feature-engine transformers

291

- Integrates with scikit-learn pipelines

292

- Maintains consistent API across transformers

293

294

### Inverse Transformation Support

295

- Provides inverse transformation when available

296

- Maintains original scale recovery capability

297

- Useful for interpretability and debugging

298

299

## Common Use Cases

300

301

1. **Preprocessing specific variable types**: Apply StandardScaler only to continuous variables

302

2. **Dimensionality reduction**: Use PCA on high-dimensional feature subsets

303

3. **Distribution transformation**: Apply QuantileTransformer to skewed variables

304

4. **Feature generation**: Create polynomial features from selected variables

305

5. **Robust scaling**: Use RobustScaler for variables with outliers

306

307

## Common Attributes

308

309

SklearnTransformerWrapper has these fitted attributes:

310

311

- `transformer_` (sklearn transformer): Fitted scikit-learn transformer instance

312

- `variables_` (list): Variables that were transformed

313

- `n_features_in_` (int): Number of features in training set

314

315

The wrapper provides access to the underlying transformer's attributes through the `transformer_` attribute, enabling access to learned parameters like feature names, explained variance, etc.