CtrlK
BlogDocsLog inGet started
Tessl Logo

tessl/pypi-feature-engine

Python library with 44+ transformers for feature engineering and selection following scikit-learn API

Pending
Quality

Pending

Does it follow best practices?

Impact

Pending

No eval scenarios have been run

SecuritybySnyk

Pending

The risk profile of this skill

Overview
Eval results
Files

Feature-Engine

A Python library with multiple transformers to engineer and select features for machine learning. All transformers follow the scikit-learn API pattern, enabling seamless integration with existing machine learning pipelines.

Package Information

  • Package Name: feature-engine
  • Package Type: library
  • Language: Python
  • Installation: pip install feature-engine

Core Imports

import feature_engine

Common import patterns for specific modules:

from feature_engine.imputation import MeanMedianImputer, CategoricalImputer
from feature_engine.encoding import OneHotEncoder, OrdinalEncoder
from feature_engine.transformation import LogTransformer, BoxCoxTransformer
from feature_engine.selection import DropFeatures, DropConstantFeatures, DropHighPSIFeatures, SelectByTargetMeanPerformance
from feature_engine.outliers import Winsorizer

Basic Usage

import pandas as pd
from feature_engine.imputation import MeanMedianImputer
from feature_engine.encoding import OrdinalEncoder
from sklearn.pipeline import Pipeline
from sklearn.ensemble import RandomForestClassifier

# Create sample data
data = {
    'numeric_var1': [1.0, 2.0, None, 4.0, 5.0],
    'numeric_var2': [10, 20, 30, None, 50],
    'categorical_var': ['A', 'B', 'A', 'C', 'B']
}
df = pd.DataFrame(data)
y = [0, 1, 0, 1, 0]

# Create transformers
imputer = MeanMedianImputer(imputation_method='median')
encoder = OrdinalEncoder(encoding_method='arbitrary')

# Fit and transform data
X_imputed = imputer.fit_transform(df)
X_encoded = encoder.fit_transform(X_imputed)

# Or use in pipeline
pipeline = Pipeline([
    ('imputer', MeanMedianImputer()),
    ('encoder', OrdinalEncoder(encoding_method='arbitrary')),
    ('classifier', RandomForestClassifier())
])

pipeline.fit(df, y)
predictions = pipeline.predict(df)

Architecture

Feature-Engine follows the scikit-learn API design pattern with consistent interfaces across all transformers:

  • fit(X, y=None): Learn transformation parameters from training data
  • transform(X): Apply learned transformation to new data
  • fit_transform(X, y=None): Combine fit and transform operations
  • inverse_transform(X): Reverse transformation (where applicable)

All transformers inherit from base classes that provide:

  • Automatic variable selection (numerical or categorical)
  • Input validation and type checking
  • Consistent parameter storage in attributes ending with _
  • Integration with pandas DataFrames

Capabilities

Missing Data Imputation

Handle missing values in numerical and categorical variables using statistical methods, arbitrary values, or advanced techniques like random sampling.

class MeanMedianImputer:
    def __init__(self, imputation_method='median', variables=None): ...
    def fit(self, X, y=None): ...
    def transform(self, X): ...

class CategoricalImputer:
    def __init__(self, imputation_method='missing', fill_value='Missing', variables=None): ...
    def fit(self, X, y=None): ...
    def transform(self, X): ...

class ArbitraryNumberImputer:
    def __init__(self, arbitrary_number=999, variables=None): ...
    def fit(self, X, y=None): ...
    def transform(self, X): ...

Missing Data Imputation

Categorical Variable Encoding

Transform categorical variables into numerical representations using various encoding methods including one-hot, ordinal, target-based, and frequency-based encoders.

class OneHotEncoder:
    def __init__(self, top_categories=None, drop_last=False, variables=None): ...
    def fit(self, X, y=None): ...
    def transform(self, X): ...

class OrdinalEncoder:
    def __init__(self, encoding_method='ordered', variables=None): ...
    def fit(self, X, y=None): ...
    def transform(self, X): ...

class MeanEncoder:
    def __init__(self, variables=None, ignore_format=False): ...
    def fit(self, X, y): ...
    def transform(self, X): ...

Categorical Variable Encoding

Variable Discretisation

Convert continuous variables into discrete intervals using equal width, equal frequency, decision tree-based, or user-defined boundaries.

class EqualWidthDiscretiser:
    def __init__(self, variables=None, return_object=False, return_boundaries=False): ...
    def fit(self, X, y=None): ...
    def transform(self, X): ...

class EqualFrequencyDiscretiser:
    def __init__(self, variables=None, return_object=False, return_boundaries=False): ...
    def fit(self, X, y=None): ...
    def transform(self, X): ...

class ArbitraryDiscretiser:
    def __init__(self, binning_dict, return_object=False, return_boundaries=False): ...
    def fit(self, X, y=None): ...
    def transform(self, X): ...

Variable Discretisation

Mathematical Transformations

Apply mathematical functions to numerical variables including logarithmic, power, reciprocal, Box-Cox, and Yeo-Johnson transformations.

class LogTransformer:
    def __init__(self, variables=None, base='e'): ...
    def fit(self, X, y=None): ...
    def transform(self, X): ...
    def inverse_transform(self, X): ...

class BoxCoxTransformer:
    def __init__(self, variables=None): ...
    def fit(self, X, y=None): ...
    def transform(self, X): ...
    def inverse_transform(self, X): ...

class PowerTransformer:
    def __init__(self, variables=None, exp=2): ...
    def fit(self, X, y=None): ...
    def transform(self, X): ...

Mathematical Transformations

Feature Selection

Remove or select features based on various criteria including variance, correlation, performance metrics, and statistical tests.

class DropFeatures:
    def __init__(self, features_to_drop): ...
    def fit(self, X, y=None): ...
    def transform(self, X): ...

class DropConstantFeatures:
    def __init__(self, variables=None, tol=1, missing_values='raise'): ...
    def fit(self, X, y=None): ...
    def transform(self, X): ...

class DropCorrelatedFeatures:
    def __init__(self, variables=None, method='pearson', threshold=0.8): ...
    def fit(self, X, y=None): ...
    def transform(self, X): ...

Feature Selection

Outlier Detection and Handling

Identify and handle outliers using statistical methods including Winsorization, capping, and trimming techniques.

class Winsorizer:
    def __init__(self, capping_method='gaussian', tail='right', fold=3, variables=None): ...
    def fit(self, X, y=None): ...
    def transform(self, X): ...

class ArbitraryOutlierCapper:
    def __init__(self, max_capping_dict=None, min_capping_dict=None, variables=None): ...
    def fit(self, X, y=None): ...
    def transform(self, X): ...

class OutlierTrimmer:
    def __init__(self, capping_method='gaussian', tail='right', fold=3, variables=None): ...
    def fit(self, X, y=None): ...
    def transform(self, X): ...

Outlier Detection and Handling

Feature Creation

Generate new features through mathematical combinations, cyclical transformations, and reference feature combinations.

class MathematicalCombination:
    def __init__(self, variables_to_combine, math_operations=None, new_variables_names=None): ...
    def fit(self, X, y=None): ...
    def transform(self, X): ...

class CyclicalTransformer:
    def __init__(self, variables=None, max_values=None, drop_original=False): ...
    def fit(self, X, y=None): ...
    def transform(self, X): ...

class CombineWithReferenceFeature:
    def __init__(self, variables_to_combine, reference_variables, operations_list): ...
    def fit(self, X, y=None): ...
    def transform(self, X): ...

Feature Creation

Datetime Feature Extraction

Extract meaningful features from datetime variables including time components, periods, and date-related boolean flags.

class DatetimeFeatures:
    def __init__(self, variables=None, features_to_extract=None, drop_original=True): ...
    def fit(self, X, y=None): ...
    def transform(self, X): ...

Datetime Feature Extraction

Scikit-learn Wrappers

Apply scikit-learn transformers to specific subsets of variables while maintaining DataFrame structure and column names.

class SklearnTransformerWrapper:
    def __init__(self, transformer, variables=None): ...
    def fit(self, X, y=None): ...
    def transform(self, X): ...
    def fit_transform(self, X, y=None): ...

Scikit-learn Wrappers

Preprocessing Utilities

General preprocessing functions for data preparation and variable matching between datasets.

class MatchVariables:
    def __init__(self, missing_values='raise'): ...
    def fit(self, X, y=None): ...
    def transform(self, X): ...

Preprocessing Utilities

Workspace
tessl
Visibility
Public
Created
Last updated
Describes
pypipkg:pypi/feature-engine@1.2.x
Publish Source
CLI
Badge
tessl/pypi-feature-engine badge