or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

docs

creation.mddatetime.mddiscretisation.mdencoding.mdimputation.mdindex.mdoutliers.mdpreprocessing.mdselection.mdtransformation.mdwrappers.md
tile.json

tessl/pypi-feature-engine

Python library with 44+ transformers for feature engineering and selection following scikit-learn API

Workspace
tessl
Visibility
Public
Created
Last updated
Describes
pypipkg:pypi/feature-engine@1.2.x

To install, run

npx @tessl/cli install tessl/pypi-feature-engine@1.2.0

index.mddocs/

Feature-Engine

A Python library with multiple transformers to engineer and select features for machine learning. All transformers follow the scikit-learn API pattern, enabling seamless integration with existing machine learning pipelines.

Package Information

  • Package Name: feature-engine
  • Package Type: library
  • Language: Python
  • Installation: pip install feature-engine

Core Imports

import feature_engine

Common import patterns for specific modules:

from feature_engine.imputation import MeanMedianImputer, CategoricalImputer
from feature_engine.encoding import OneHotEncoder, OrdinalEncoder
from feature_engine.transformation import LogTransformer, BoxCoxTransformer
from feature_engine.selection import DropFeatures, DropConstantFeatures, DropHighPSIFeatures, SelectByTargetMeanPerformance
from feature_engine.outliers import Winsorizer

Basic Usage

import pandas as pd
from feature_engine.imputation import MeanMedianImputer
from feature_engine.encoding import OrdinalEncoder
from sklearn.pipeline import Pipeline
from sklearn.ensemble import RandomForestClassifier

# Create sample data
data = {
    'numeric_var1': [1.0, 2.0, None, 4.0, 5.0],
    'numeric_var2': [10, 20, 30, None, 50],
    'categorical_var': ['A', 'B', 'A', 'C', 'B']
}
df = pd.DataFrame(data)
y = [0, 1, 0, 1, 0]

# Create transformers
imputer = MeanMedianImputer(imputation_method='median')
encoder = OrdinalEncoder(encoding_method='arbitrary')

# Fit and transform data
X_imputed = imputer.fit_transform(df)
X_encoded = encoder.fit_transform(X_imputed)

# Or use in pipeline
pipeline = Pipeline([
    ('imputer', MeanMedianImputer()),
    ('encoder', OrdinalEncoder(encoding_method='arbitrary')),
    ('classifier', RandomForestClassifier())
])

pipeline.fit(df, y)
predictions = pipeline.predict(df)

Architecture

Feature-Engine follows the scikit-learn API design pattern with consistent interfaces across all transformers:

  • fit(X, y=None): Learn transformation parameters from training data
  • transform(X): Apply learned transformation to new data
  • fit_transform(X, y=None): Combine fit and transform operations
  • inverse_transform(X): Reverse transformation (where applicable)

All transformers inherit from base classes that provide:

  • Automatic variable selection (numerical or categorical)
  • Input validation and type checking
  • Consistent parameter storage in attributes ending with _
  • Integration with pandas DataFrames

Capabilities

Missing Data Imputation

Handle missing values in numerical and categorical variables using statistical methods, arbitrary values, or advanced techniques like random sampling.

class MeanMedianImputer:
    def __init__(self, imputation_method='median', variables=None): ...
    def fit(self, X, y=None): ...
    def transform(self, X): ...

class CategoricalImputer:
    def __init__(self, imputation_method='missing', fill_value='Missing', variables=None): ...
    def fit(self, X, y=None): ...
    def transform(self, X): ...

class ArbitraryNumberImputer:
    def __init__(self, arbitrary_number=999, variables=None): ...
    def fit(self, X, y=None): ...
    def transform(self, X): ...

Missing Data Imputation

Categorical Variable Encoding

Transform categorical variables into numerical representations using various encoding methods including one-hot, ordinal, target-based, and frequency-based encoders.

class OneHotEncoder:
    def __init__(self, top_categories=None, drop_last=False, variables=None): ...
    def fit(self, X, y=None): ...
    def transform(self, X): ...

class OrdinalEncoder:
    def __init__(self, encoding_method='ordered', variables=None): ...
    def fit(self, X, y=None): ...
    def transform(self, X): ...

class MeanEncoder:
    def __init__(self, variables=None, ignore_format=False): ...
    def fit(self, X, y): ...
    def transform(self, X): ...

Categorical Variable Encoding

Variable Discretisation

Convert continuous variables into discrete intervals using equal width, equal frequency, decision tree-based, or user-defined boundaries.

class EqualWidthDiscretiser:
    def __init__(self, variables=None, return_object=False, return_boundaries=False): ...
    def fit(self, X, y=None): ...
    def transform(self, X): ...

class EqualFrequencyDiscretiser:
    def __init__(self, variables=None, return_object=False, return_boundaries=False): ...
    def fit(self, X, y=None): ...
    def transform(self, X): ...

class ArbitraryDiscretiser:
    def __init__(self, binning_dict, return_object=False, return_boundaries=False): ...
    def fit(self, X, y=None): ...
    def transform(self, X): ...

Variable Discretisation

Mathematical Transformations

Apply mathematical functions to numerical variables including logarithmic, power, reciprocal, Box-Cox, and Yeo-Johnson transformations.

class LogTransformer:
    def __init__(self, variables=None, base='e'): ...
    def fit(self, X, y=None): ...
    def transform(self, X): ...
    def inverse_transform(self, X): ...

class BoxCoxTransformer:
    def __init__(self, variables=None): ...
    def fit(self, X, y=None): ...
    def transform(self, X): ...
    def inverse_transform(self, X): ...

class PowerTransformer:
    def __init__(self, variables=None, exp=2): ...
    def fit(self, X, y=None): ...
    def transform(self, X): ...

Mathematical Transformations

Feature Selection

Remove or select features based on various criteria including variance, correlation, performance metrics, and statistical tests.

class DropFeatures:
    def __init__(self, features_to_drop): ...
    def fit(self, X, y=None): ...
    def transform(self, X): ...

class DropConstantFeatures:
    def __init__(self, variables=None, tol=1, missing_values='raise'): ...
    def fit(self, X, y=None): ...
    def transform(self, X): ...

class DropCorrelatedFeatures:
    def __init__(self, variables=None, method='pearson', threshold=0.8): ...
    def fit(self, X, y=None): ...
    def transform(self, X): ...

Feature Selection

Outlier Detection and Handling

Identify and handle outliers using statistical methods including Winsorization, capping, and trimming techniques.

class Winsorizer:
    def __init__(self, capping_method='gaussian', tail='right', fold=3, variables=None): ...
    def fit(self, X, y=None): ...
    def transform(self, X): ...

class ArbitraryOutlierCapper:
    def __init__(self, max_capping_dict=None, min_capping_dict=None, variables=None): ...
    def fit(self, X, y=None): ...
    def transform(self, X): ...

class OutlierTrimmer:
    def __init__(self, capping_method='gaussian', tail='right', fold=3, variables=None): ...
    def fit(self, X, y=None): ...
    def transform(self, X): ...

Outlier Detection and Handling

Feature Creation

Generate new features through mathematical combinations, cyclical transformations, and reference feature combinations.

class MathematicalCombination:
    def __init__(self, variables_to_combine, math_operations=None, new_variables_names=None): ...
    def fit(self, X, y=None): ...
    def transform(self, X): ...

class CyclicalTransformer:
    def __init__(self, variables=None, max_values=None, drop_original=False): ...
    def fit(self, X, y=None): ...
    def transform(self, X): ...

class CombineWithReferenceFeature:
    def __init__(self, variables_to_combine, reference_variables, operations_list): ...
    def fit(self, X, y=None): ...
    def transform(self, X): ...

Feature Creation

Datetime Feature Extraction

Extract meaningful features from datetime variables including time components, periods, and date-related boolean flags.

class DatetimeFeatures:
    def __init__(self, variables=None, features_to_extract=None, drop_original=True): ...
    def fit(self, X, y=None): ...
    def transform(self, X): ...

Datetime Feature Extraction

Scikit-learn Wrappers

Apply scikit-learn transformers to specific subsets of variables while maintaining DataFrame structure and column names.

class SklearnTransformerWrapper:
    def __init__(self, transformer, variables=None): ...
    def fit(self, X, y=None): ...
    def transform(self, X): ...
    def fit_transform(self, X, y=None): ...

Scikit-learn Wrappers

Preprocessing Utilities

General preprocessing functions for data preparation and variable matching between datasets.

class MatchVariables:
    def __init__(self, missing_values='raise'): ...
    def fit(self, X, y=None): ...
    def transform(self, X): ...

Preprocessing Utilities