tessl/pypi-imbalanced-learn

Toolbox for imbalanced dataset in machine learning

—

Pending

Overview

Eval results

Files

Metrics

Name: tessl/pypi-imbalanced-learn
Author: tessl

Specialized metrics for evaluating classification and regression performance on imbalanced datasets, extending scikit-learn's standard metrics with measures designed for class imbalance scenarios.

Overview

Imbalanced-learn provides specialized metrics that are particularly relevant for evaluating model performance on imbalanced datasets. These metrics complement scikit-learn's standard metrics by focusing on measures that better capture performance across minority and majority classes.

Key Features

Class-balanced evaluation: Metrics that give equal importance to all classes regardless of their frequency
Sensitivity and specificity: Medical/diagnostic-inspired metrics for binary classification
Geometric mean: Root of the product of class-wise sensitivities
Index balanced accuracy: Corrected metrics accounting for class dominance
Comprehensive reporting: Extended classification reports with imbalanced-specific metrics

Metric Categories

Individual Classification Metrics

sensitivity_score: True positive rate (recall)
specificity_score: True negative rate
geometric_mean_score: Balanced accuracy measure

Composite Classification Metrics

sensitivity_specificity_support: Combined sensitivity, specificity, and support
classification_report_imbalanced: Comprehensive imbalanced classification report

Regression Metrics

macro_averaged_mean_absolute_error: Class-balanced MAE for ordinal classification

Meta-Functions

make_index_balanced_accuracy: Decorator for correcting metrics with dominance factor

Classification Metrics

Individual Metrics

sensitivity_score

{ .api }
def sensitivity_score(
    y_true,
    y_pred,
    *,
    labels=None,
    pos_label=1,
    average="binary",
    sample_weight=None
) -> float | ndarray

Compute the sensitivity (true positive rate).

Parameters:

y_true (array-like of shape (n_samples,)): Ground truth target values
y_pred (array-like of shape (n_samples,)): Estimated targets from classifier
labels (array-like, optional): Set of labels to include when average != 'binary'
pos_label (str, int, or None, default=1): Class to report for binary classification
average (str, default="binary"): Averaging strategy - 'binary', 'micro', 'macro', 'weighted', 'samples', or None
sample_weight (array-like, optional): Sample weights

Returns:

sensitivity (float or ndarray): Sensitivity score(s)

Mathematical Definition: Sensitivity = TP / (TP + FN)

Where TP is true positives and FN is false negatives. Sensitivity quantifies the ability to avoid false negatives.

Example:

from imblearn.metrics import sensitivity_score

y_true = [0, 1, 2, 0, 1, 2]
y_pred = [0, 2, 1, 0, 0, 1]

# Macro-averaged sensitivity
sensitivity_score(y_true, y_pred, average='macro')
# 0.33...

# Per-class sensitivity
sensitivity_score(y_true, y_pred, average=None)
# array([1., 0., 0.])

specificity_score

{ .api }
def specificity_score(
    y_true,
    y_pred,
    *,
    labels=None,
    pos_label=1,
    average="binary", 
    sample_weight=None
) -> float | ndarray

Compute the specificity (true negative rate).

Parameters:

y_true (array-like of shape (n_samples,)): Ground truth target values
y_pred (array-like of shape (n_samples,)): Estimated targets from classifier
labels (array-like, optional): Set of labels to include when average != 'binary'
pos_label (str, int, or None, default=1): Class to report for binary classification
average (str, default="binary"): Averaging strategy - 'binary', 'micro', 'macro', 'weighted', 'samples', or None
sample_weight (array-like, optional): Sample weights

Returns:

specificity (float or ndarray): Specificity score(s)

Mathematical Definition: Specificity = TN / (TN + FP)

Where TN is true negatives and FP is false positives. Specificity quantifies the ability to avoid false positives.

Example:

from imblearn.metrics import specificity_score

y_true = [0, 1, 2, 0, 1, 2]
y_pred = [0, 2, 1, 0, 0, 1]

# Macro-averaged specificity
specificity_score(y_true, y_pred, average='macro')
# 0.66...

# Per-class specificity
specificity_score(y_true, y_pred, average=None)
# array([0.75, 0.5, 0.75])

geometric_mean_score

{ .api }
def geometric_mean_score(
    y_true,
    y_pred,
    *,
    labels=None,
    pos_label=1,
    average="multiclass",
    sample_weight=None,
    correction=0.0
) -> float

Compute the geometric mean of class-wise sensitivities.

Parameters:

y_true (array-like of shape (n_samples,)): Ground truth target values
y_pred (array-like of shape (n_samples,)): Estimated targets from classifier
labels (array-like, optional): Set of labels to include
pos_label (str, int, or None, default=1): Class to report for binary classification
average (str, default="multiclass"): Averaging strategy - 'binary', 'micro', 'macro', 'weighted', 'multiclass', 'samples', or None
sample_weight (array-like, optional): Sample weights
correction (float, default=0.0): Substitute for zero sensitivities to avoid zero G-mean

Returns:

geometric_mean (float): Geometric mean score

Mathematical Definition:

Binary classification: G-mean = √(Sensitivity × Specificity)
Multi-class classification: G-mean = ⁿ√(∏ᵢ₌₁ⁿ Sensitivityᵢ)

The geometric mean tries to maximize accuracy on each class while keeping accuracies balanced. If any class has zero sensitivity, G-mean becomes zero unless corrected.

Example:

from imblearn.metrics import geometric_mean_score

y_true = [0, 1, 2, 0, 1, 2]
y_pred = [0, 2, 1, 0, 0, 1]

# Multi-class geometric mean
geometric_mean_score(y_true, y_pred)
# 0.0

# With correction for unrecognized classes
geometric_mean_score(y_true, y_pred, correction=0.001)
# 0.010...

# Macro-averaged (one-vs-rest)
geometric_mean_score(y_true, y_pred, average='macro')
# 0.471...

Composite Metrics

sensitivity_specificity_support

{ .api }
def sensitivity_specificity_support(
    y_true,
    y_pred,
    *,
    labels=None,
    pos_label=1,
    average=None,
    warn_for=("sensitivity", "specificity"),
    sample_weight=None
) -> tuple[float | ndarray, float | ndarray, int | ndarray | None]

Compute sensitivity, specificity, and support for each class.

Parameters:

y_true (array-like of shape (n_samples,)): Ground truth target values
y_pred (array-like of shape (n_samples,)): Estimated targets from classifier
labels (array-like, optional): Set of labels to include when average != 'binary'
pos_label (str, int, or None, default=1): Class to report for binary classification
average (str, optional): Averaging strategy - 'binary', 'micro', 'macro', 'weighted', 'samples', or None
warn_for (tuple, default=("sensitivity", "specificity")): Metrics to warn about
sample_weight (array-like, optional): Sample weights

Returns:

sensitivity (float or ndarray): Sensitivity metric(s)
specificity (float or ndarray): Specificity metric(s)
support (int, ndarray, or None): Number of occurrences of each label

Example:

from imblearn.metrics import sensitivity_specificity_support

y_true = ['cat', 'dog', 'pig', 'cat', 'dog', 'pig']
y_pred = ['cat', 'pig', 'dog', 'cat', 'cat', 'dog']

# Macro-averaged metrics
sensitivity_specificity_support(y_true, y_pred, average='macro')
# (0.33..., 0.66..., None)

# Per-class metrics
sen, spe, sup = sensitivity_specificity_support(y_true, y_pred, average=None)
print(f"Sensitivity: {sen}")
print(f"Specificity: {spe}")  
print(f"Support: {sup}")

classification_report_imbalanced

{ .api }
def classification_report_imbalanced(
    y_true,
    y_pred,
    *,
    labels=None,
    target_names=None,
    sample_weight=None,
    digits=2,
    alpha=0.1,
    output_dict=False,
    zero_division="warn"
) -> str | dict

Build a comprehensive classification report for imbalanced datasets.

Parameters:

y_true (array-like): Ground truth target values
y_pred (array-like): Estimated targets from classifier
labels (array-like, optional): Label indices to include in report
target_names (list of str, optional): Display names for labels
sample_weight (array-like, optional): Sample weights
digits (int, default=2): Number of digits for formatting floating point values
alpha (float, default=0.1): Weighting factor for index balanced accuracy
output_dict (bool, default=False): Return output as dictionary
zero_division ("warn" or {0, 1}, default="warn"): Value for zero division cases

Returns:

report (str or dict): Classification report with precision, recall, specificity, f1, geometric mean, and index balanced accuracy

Metrics Included:

pre: Precision
rec: Recall (Sensitivity)
spe: Specificity
f1: F1-score
geo: Geometric mean
iba: Index balanced accuracy
sup: Support

Example:

from imblearn.metrics import classification_report_imbalanced

y_true = [0, 1, 2, 2, 2]
y_pred = [0, 0, 2, 2, 1] 
target_names = ['class 0', 'class 1', 'class 2']

print(classification_report_imbalanced(
    y_true, y_pred, target_names=target_names
))

#                   pre       rec       spe        f1       geo       iba       sup
# 
#     class 0       0.50      1.00      0.75      0.67      0.87      0.77         1
#     class 1       0.00      0.00      0.75      0.00      0.00      0.00         1  
#     class 2       1.00      0.67      1.00      0.80      0.82      0.64         3
#
# avg / total       0.70      0.60      0.90      0.61      0.66      0.54         5

Regression Metrics

macro_averaged_mean_absolute_error

{ .api }
def macro_averaged_mean_absolute_error(
    y_true,
    y_pred,
    *,
    sample_weight=None
) -> float

Compute Macro-Averaged MAE for imbalanced ordinal classification.

Parameters:

y_true (array-like of shape (n_samples,) or (n_samples, n_outputs)): Ground truth target values
y_pred (array-like of shape (n_samples,) or (n_samples, n_outputs)): Estimated targets
sample_weight (array-like, optional): Sample weights

Returns:

loss (float or ndarray): Macro-averaged MAE (lower is better)

Description: Computes MAE for each class separately and averages them, giving equal weight to each class regardless of class frequency. This provides a more balanced evaluation for imbalanced ordinal classification problems compared to standard MAE.

Example:

from sklearn.metrics import mean_absolute_error
from imblearn.metrics import macro_averaged_mean_absolute_error

y_true_balanced = [1, 1, 2, 2]
y_true_imbalanced = [1, 2, 2, 2]
y_pred = [1, 2, 1, 2]

# Standard MAE
mean_absolute_error(y_true_balanced, y_pred)   # 0.5
mean_absolute_error(y_true_imbalanced, y_pred) # 0.25

# Macro-averaged MAE 
macro_averaged_mean_absolute_error(y_true_balanced, y_pred)   # 0.5
macro_averaged_mean_absolute_error(y_true_imbalanced, y_pred) # 0.16...

Meta-Functions

make_index_balanced_accuracy

{ .api }
def make_index_balanced_accuracy(
    *,
    alpha=0.1,
    squared=True
) -> callable

Factory function to create Index Balanced Accuracy (IBA) corrected metrics.

Parameters:

alpha (float, default=0.1): Weighting factor for dominance correction
squared (bool, default=True): Whether to square the metric before weighting

Returns:

iba_scoring_func (callable): Decorator function that applies IBA correction to any scoring metric

Description: The Index Balanced Accuracy corrects standard metrics by accounting for the dominance relationship between sensitivity and specificity. The corrected score is calculated as:

IBA_α(metric) = (1 + α × dominance) × metric^squared

Where dominance = sensitivity - specificity

Mathematical Definition:

Dominance: D = Sensitivity - Specificity
IBA correction: IBA_α(M) = (1 + α × D) × M² (if squared=True)

Example:

from imblearn.metrics import geometric_mean_score, make_index_balanced_accuracy

# Create IBA-corrected geometric mean
iba_gmean = make_index_balanced_accuracy(alpha=0.1, squared=True)(geometric_mean_score)

y_true = [1, 0, 0, 1, 0, 1]
y_pred = [0, 0, 1, 1, 0, 1]

# Apply IBA correction
iba_scores = iba_gmean(y_true, y_pred, average=None)
print(iba_scores)
# [0.44..., 0.44...]

Relationship to scikit-learn Metrics

Complementary Metrics

Sensitivity is equivalent to sklearn's recall_score
Specificity has no direct sklearn equivalent (inverse of false positive rate)
Geometric mean provides balanced accuracy alternative to sklearn's balanced_accuracy_score

Enhanced Reports

classification_report_imbalanced extends sklearn's classification_report with:
- Specificity scores
- Geometric mean scores
- Index balanced accuracy scores
- Better handling of imbalanced data

Usage Patterns

Binary Classification:

from imblearn.metrics import sensitivity_score, specificity_score, geometric_mean_score

# Individual metrics
sensitivity = sensitivity_score(y_true, y_pred, pos_label=1)
specificity = specificity_score(y_true, y_pred, pos_label=1)
gmean = geometric_mean_score(y_true, y_pred, average='binary')

Multi-class Classification:

# Per-class metrics
sen_per_class = sensitivity_score(y_true, y_pred, average=None)
spe_per_class = specificity_score(y_true, y_pred, average=None)

# Averaged metrics
sen_macro = sensitivity_score(y_true, y_pred, average='macro')
gmean_multiclass = geometric_mean_score(y_true, y_pred, average='multiclass')

Comprehensive Evaluation:

# Complete imbalanced classification report
report = classification_report_imbalanced(y_true, y_pred, target_names=class_names)
print(report)

# Or as dictionary for programmatic access
report_dict = classification_report_imbalanced(
    y_true, y_pred, target_names=class_names, output_dict=True
)

Best Practices

Choose appropriate averaging: Use 'macro' for equal class importance, 'weighted' for frequency-weighted importance
Handle zero classes: Use correction parameter in geometric_mean_score for highly imbalanced datasets
Combine metrics: Use classification_report_imbalanced for comprehensive evaluation
Apply IBA correction: Use make_index_balanced_accuracy to correct for class dominance effects
Consider ordinal data: Use macro_averaged_mean_absolute_error for imbalanced ordinal classification