or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

creation.mddatetime.mddiscretisation.mdencoding.mdimputation.mdindex.mdoutliers.mdpreprocessing.mdselection.mdtransformation.mdwrappers.md

index.mddocs/

0

# Feature-Engine

1

2

A Python library with multiple transformers to engineer and select features for machine learning. All transformers follow the scikit-learn API pattern, enabling seamless integration with existing machine learning pipelines.

3

4

## Package Information

5

6

- **Package Name**: feature-engine

7

- **Package Type**: library

8

- **Language**: Python

9

- **Installation**: `pip install feature-engine`

10

11

## Core Imports

12

13

```python

14

import feature_engine

15

```

16

17

Common import patterns for specific modules:

18

19

```python

20

from feature_engine.imputation import MeanMedianImputer, CategoricalImputer

21

from feature_engine.encoding import OneHotEncoder, OrdinalEncoder

22

from feature_engine.transformation import LogTransformer, BoxCoxTransformer

23

from feature_engine.selection import DropFeatures, DropConstantFeatures, DropHighPSIFeatures, SelectByTargetMeanPerformance

24

from feature_engine.outliers import Winsorizer

25

```

26

27

## Basic Usage

28

29

```python

30

import pandas as pd

31

from feature_engine.imputation import MeanMedianImputer

32

from feature_engine.encoding import OrdinalEncoder

33

from sklearn.pipeline import Pipeline

34

from sklearn.ensemble import RandomForestClassifier

35

36

# Create sample data

37

data = {

38

'numeric_var1': [1.0, 2.0, None, 4.0, 5.0],

39

'numeric_var2': [10, 20, 30, None, 50],

40

'categorical_var': ['A', 'B', 'A', 'C', 'B']

41

}

42

df = pd.DataFrame(data)

43

y = [0, 1, 0, 1, 0]

44

45

# Create transformers

46

imputer = MeanMedianImputer(imputation_method='median')

47

encoder = OrdinalEncoder(encoding_method='arbitrary')

48

49

# Fit and transform data

50

X_imputed = imputer.fit_transform(df)

51

X_encoded = encoder.fit_transform(X_imputed)

52

53

# Or use in pipeline

54

pipeline = Pipeline([

55

('imputer', MeanMedianImputer()),

56

('encoder', OrdinalEncoder(encoding_method='arbitrary')),

57

('classifier', RandomForestClassifier())

58

])

59

60

pipeline.fit(df, y)

61

predictions = pipeline.predict(df)

62

```

63

64

## Architecture

65

66

Feature-Engine follows the scikit-learn API design pattern with consistent interfaces across all transformers:

67

68

- **fit(X, y=None)**: Learn transformation parameters from training data

69

- **transform(X)**: Apply learned transformation to new data

70

- **fit_transform(X, y=None)**: Combine fit and transform operations

71

- **inverse_transform(X)**: Reverse transformation (where applicable)

72

73

All transformers inherit from base classes that provide:

74

- Automatic variable selection (numerical or categorical)

75

- Input validation and type checking

76

- Consistent parameter storage in attributes ending with `_`

77

- Integration with pandas DataFrames

78

79

## Capabilities

80

81

### Missing Data Imputation

82

83

Handle missing values in numerical and categorical variables using statistical methods, arbitrary values, or advanced techniques like random sampling.

84

85

```python { .api }

86

class MeanMedianImputer:

87

def __init__(self, imputation_method='median', variables=None): ...

88

def fit(self, X, y=None): ...

89

def transform(self, X): ...

90

91

class CategoricalImputer:

92

def __init__(self, imputation_method='missing', fill_value='Missing', variables=None): ...

93

def fit(self, X, y=None): ...

94

def transform(self, X): ...

95

96

class ArbitraryNumberImputer:

97

def __init__(self, arbitrary_number=999, variables=None): ...

98

def fit(self, X, y=None): ...

99

def transform(self, X): ...

100

```

101

102

[Missing Data Imputation](./imputation.md)

103

104

### Categorical Variable Encoding

105

106

Transform categorical variables into numerical representations using various encoding methods including one-hot, ordinal, target-based, and frequency-based encoders.

107

108

```python { .api }

109

class OneHotEncoder:

110

def __init__(self, top_categories=None, drop_last=False, variables=None): ...

111

def fit(self, X, y=None): ...

112

def transform(self, X): ...

113

114

class OrdinalEncoder:

115

def __init__(self, encoding_method='ordered', variables=None): ...

116

def fit(self, X, y=None): ...

117

def transform(self, X): ...

118

119

class MeanEncoder:

120

def __init__(self, variables=None, ignore_format=False): ...

121

def fit(self, X, y): ...

122

def transform(self, X): ...

123

```

124

125

[Categorical Variable Encoding](./encoding.md)

126

127

### Variable Discretisation

128

129

Convert continuous variables into discrete intervals using equal width, equal frequency, decision tree-based, or user-defined boundaries.

130

131

```python { .api }

132

class EqualWidthDiscretiser:

133

def __init__(self, variables=None, return_object=False, return_boundaries=False): ...

134

def fit(self, X, y=None): ...

135

def transform(self, X): ...

136

137

class EqualFrequencyDiscretiser:

138

def __init__(self, variables=None, return_object=False, return_boundaries=False): ...

139

def fit(self, X, y=None): ...

140

def transform(self, X): ...

141

142

class ArbitraryDiscretiser:

143

def __init__(self, binning_dict, return_object=False, return_boundaries=False): ...

144

def fit(self, X, y=None): ...

145

def transform(self, X): ...

146

```

147

148

[Variable Discretisation](./discretisation.md)

149

150

### Mathematical Transformations

151

152

Apply mathematical functions to numerical variables including logarithmic, power, reciprocal, Box-Cox, and Yeo-Johnson transformations.

153

154

```python { .api }

155

class LogTransformer:

156

def __init__(self, variables=None, base='e'): ...

157

def fit(self, X, y=None): ...

158

def transform(self, X): ...

159

def inverse_transform(self, X): ...

160

161

class BoxCoxTransformer:

162

def __init__(self, variables=None): ...

163

def fit(self, X, y=None): ...

164

def transform(self, X): ...

165

def inverse_transform(self, X): ...

166

167

class PowerTransformer:

168

def __init__(self, variables=None, exp=2): ...

169

def fit(self, X, y=None): ...

170

def transform(self, X): ...

171

```

172

173

[Mathematical Transformations](./transformation.md)

174

175

### Feature Selection

176

177

Remove or select features based on various criteria including variance, correlation, performance metrics, and statistical tests.

178

179

```python { .api }

180

class DropFeatures:

181

def __init__(self, features_to_drop): ...

182

def fit(self, X, y=None): ...

183

def transform(self, X): ...

184

185

class DropConstantFeatures:

186

def __init__(self, variables=None, tol=1, missing_values='raise'): ...

187

def fit(self, X, y=None): ...

188

def transform(self, X): ...

189

190

class DropCorrelatedFeatures:

191

def __init__(self, variables=None, method='pearson', threshold=0.8): ...

192

def fit(self, X, y=None): ...

193

def transform(self, X): ...

194

```

195

196

[Feature Selection](./selection.md)

197

198

### Outlier Detection and Handling

199

200

Identify and handle outliers using statistical methods including Winsorization, capping, and trimming techniques.

201

202

```python { .api }

203

class Winsorizer:

204

def __init__(self, capping_method='gaussian', tail='right', fold=3, variables=None): ...

205

def fit(self, X, y=None): ...

206

def transform(self, X): ...

207

208

class ArbitraryOutlierCapper:

209

def __init__(self, max_capping_dict=None, min_capping_dict=None, variables=None): ...

210

def fit(self, X, y=None): ...

211

def transform(self, X): ...

212

213

class OutlierTrimmer:

214

def __init__(self, capping_method='gaussian', tail='right', fold=3, variables=None): ...

215

def fit(self, X, y=None): ...

216

def transform(self, X): ...

217

```

218

219

[Outlier Detection and Handling](./outliers.md)

220

221

### Feature Creation

222

223

Generate new features through mathematical combinations, cyclical transformations, and reference feature combinations.

224

225

```python { .api }

226

class MathematicalCombination:

227

def __init__(self, variables_to_combine, math_operations=None, new_variables_names=None): ...

228

def fit(self, X, y=None): ...

229

def transform(self, X): ...

230

231

class CyclicalTransformer:

232

def __init__(self, variables=None, max_values=None, drop_original=False): ...

233

def fit(self, X, y=None): ...

234

def transform(self, X): ...

235

236

class CombineWithReferenceFeature:

237

def __init__(self, variables_to_combine, reference_variables, operations_list): ...

238

def fit(self, X, y=None): ...

239

def transform(self, X): ...

240

```

241

242

[Feature Creation](./creation.md)

243

244

### Datetime Feature Extraction

245

246

Extract meaningful features from datetime variables including time components, periods, and date-related boolean flags.

247

248

```python { .api }

249

class DatetimeFeatures:

250

def __init__(self, variables=None, features_to_extract=None, drop_original=True): ...

251

def fit(self, X, y=None): ...

252

def transform(self, X): ...

253

```

254

255

[Datetime Feature Extraction](./datetime.md)

256

257

### Scikit-learn Wrappers

258

259

Apply scikit-learn transformers to specific subsets of variables while maintaining DataFrame structure and column names.

260

261

```python { .api }

262

class SklearnTransformerWrapper:

263

def __init__(self, transformer, variables=None): ...

264

def fit(self, X, y=None): ...

265

def transform(self, X): ...

266

def fit_transform(self, X, y=None): ...

267

```

268

269

[Scikit-learn Wrappers](./wrappers.md)

270

271

### Preprocessing Utilities

272

273

General preprocessing functions for data preparation and variable matching between datasets.

274

275

```python { .api }

276

class MatchVariables:

277

def __init__(self, missing_values='raise'): ...

278

def fit(self, X, y=None): ...

279

def transform(self, X): ...

280

```

281

282

[Preprocessing Utilities](./preprocessing.md)