Toolbox for imbalanced dataset in machine learning
—
Specialized metrics for evaluating classification and regression performance on imbalanced datasets, extending scikit-learn's standard metrics with measures designed for class imbalance scenarios.
Imbalanced-learn provides specialized metrics that are particularly relevant for evaluating model performance on imbalanced datasets. These metrics complement scikit-learn's standard metrics by focusing on measures that better capture performance across minority and majority classes.
Individual Classification Metrics
sensitivity_score: True positive rate (recall)specificity_score: True negative rategeometric_mean_score: Balanced accuracy measureComposite Classification Metrics
sensitivity_specificity_support: Combined sensitivity, specificity, and supportclassification_report_imbalanced: Comprehensive imbalanced classification reportRegression Metrics
macro_averaged_mean_absolute_error: Class-balanced MAE for ordinal classificationMeta-Functions
make_index_balanced_accuracy: Decorator for correcting metrics with dominance factor{ .api }
def sensitivity_score(
y_true,
y_pred,
*,
labels=None,
pos_label=1,
average="binary",
sample_weight=None
) -> float | ndarrayCompute the sensitivity (true positive rate).
Parameters:
array-like of shape (n_samples,)): Ground truth target valuesarray-like of shape (n_samples,)): Estimated targets from classifierarray-like, optional): Set of labels to include when average != 'binary'str, int, or None, default=1): Class to report for binary classificationstr, default="binary"): Averaging strategy - 'binary', 'micro', 'macro', 'weighted', 'samples', or Nonearray-like, optional): Sample weightsReturns:
float or ndarray): Sensitivity score(s)Mathematical Definition: Sensitivity = TP / (TP + FN)
Where TP is true positives and FN is false negatives. Sensitivity quantifies the ability to avoid false negatives.
Example:
from imblearn.metrics import sensitivity_score
y_true = [0, 1, 2, 0, 1, 2]
y_pred = [0, 2, 1, 0, 0, 1]
# Macro-averaged sensitivity
sensitivity_score(y_true, y_pred, average='macro')
# 0.33...
# Per-class sensitivity
sensitivity_score(y_true, y_pred, average=None)
# array([1., 0., 0.]){ .api }
def specificity_score(
y_true,
y_pred,
*,
labels=None,
pos_label=1,
average="binary",
sample_weight=None
) -> float | ndarrayCompute the specificity (true negative rate).
Parameters:
array-like of shape (n_samples,)): Ground truth target valuesarray-like of shape (n_samples,)): Estimated targets from classifierarray-like, optional): Set of labels to include when average != 'binary'str, int, or None, default=1): Class to report for binary classificationstr, default="binary"): Averaging strategy - 'binary', 'micro', 'macro', 'weighted', 'samples', or Nonearray-like, optional): Sample weightsReturns:
float or ndarray): Specificity score(s)Mathematical Definition: Specificity = TN / (TN + FP)
Where TN is true negatives and FP is false positives. Specificity quantifies the ability to avoid false positives.
Example:
from imblearn.metrics import specificity_score
y_true = [0, 1, 2, 0, 1, 2]
y_pred = [0, 2, 1, 0, 0, 1]
# Macro-averaged specificity
specificity_score(y_true, y_pred, average='macro')
# 0.66...
# Per-class specificity
specificity_score(y_true, y_pred, average=None)
# array([0.75, 0.5, 0.75]){ .api }
def geometric_mean_score(
y_true,
y_pred,
*,
labels=None,
pos_label=1,
average="multiclass",
sample_weight=None,
correction=0.0
) -> floatCompute the geometric mean of class-wise sensitivities.
Parameters:
array-like of shape (n_samples,)): Ground truth target valuesarray-like of shape (n_samples,)): Estimated targets from classifierarray-like, optional): Set of labels to includestr, int, or None, default=1): Class to report for binary classificationstr, default="multiclass"): Averaging strategy - 'binary', 'micro', 'macro', 'weighted', 'multiclass', 'samples', or Nonearray-like, optional): Sample weightsfloat, default=0.0): Substitute for zero sensitivities to avoid zero G-meanReturns:
float): Geometric mean scoreMathematical Definition:
The geometric mean tries to maximize accuracy on each class while keeping accuracies balanced. If any class has zero sensitivity, G-mean becomes zero unless corrected.
Example:
from imblearn.metrics import geometric_mean_score
y_true = [0, 1, 2, 0, 1, 2]
y_pred = [0, 2, 1, 0, 0, 1]
# Multi-class geometric mean
geometric_mean_score(y_true, y_pred)
# 0.0
# With correction for unrecognized classes
geometric_mean_score(y_true, y_pred, correction=0.001)
# 0.010...
# Macro-averaged (one-vs-rest)
geometric_mean_score(y_true, y_pred, average='macro')
# 0.471...{ .api }
def sensitivity_specificity_support(
y_true,
y_pred,
*,
labels=None,
pos_label=1,
average=None,
warn_for=("sensitivity", "specificity"),
sample_weight=None
) -> tuple[float | ndarray, float | ndarray, int | ndarray | None]Compute sensitivity, specificity, and support for each class.
Parameters:
array-like of shape (n_samples,)): Ground truth target valuesarray-like of shape (n_samples,)): Estimated targets from classifierarray-like, optional): Set of labels to include when average != 'binary'str, int, or None, default=1): Class to report for binary classificationstr, optional): Averaging strategy - 'binary', 'micro', 'macro', 'weighted', 'samples', or Nonetuple, default=("sensitivity", "specificity")): Metrics to warn aboutarray-like, optional): Sample weightsReturns:
float or ndarray): Sensitivity metric(s)float or ndarray): Specificity metric(s)int, ndarray, or None): Number of occurrences of each labelExample:
from imblearn.metrics import sensitivity_specificity_support
y_true = ['cat', 'dog', 'pig', 'cat', 'dog', 'pig']
y_pred = ['cat', 'pig', 'dog', 'cat', 'cat', 'dog']
# Macro-averaged metrics
sensitivity_specificity_support(y_true, y_pred, average='macro')
# (0.33..., 0.66..., None)
# Per-class metrics
sen, spe, sup = sensitivity_specificity_support(y_true, y_pred, average=None)
print(f"Sensitivity: {sen}")
print(f"Specificity: {spe}")
print(f"Support: {sup}"){ .api }
def classification_report_imbalanced(
y_true,
y_pred,
*,
labels=None,
target_names=None,
sample_weight=None,
digits=2,
alpha=0.1,
output_dict=False,
zero_division="warn"
) -> str | dictBuild a comprehensive classification report for imbalanced datasets.
Parameters:
array-like): Ground truth target valuesarray-like): Estimated targets from classifierarray-like, optional): Label indices to include in reportlist of str, optional): Display names for labelsarray-like, optional): Sample weightsint, default=2): Number of digits for formatting floating point valuesfloat, default=0.1): Weighting factor for index balanced accuracybool, default=False): Return output as dictionary"warn" or {0, 1}, default="warn"): Value for zero division casesReturns:
str or dict): Classification report with precision, recall, specificity, f1, geometric mean, and index balanced accuracyMetrics Included:
Example:
from imblearn.metrics import classification_report_imbalanced
y_true = [0, 1, 2, 2, 2]
y_pred = [0, 0, 2, 2, 1]
target_names = ['class 0', 'class 1', 'class 2']
print(classification_report_imbalanced(
y_true, y_pred, target_names=target_names
))
# pre rec spe f1 geo iba sup
#
# class 0 0.50 1.00 0.75 0.67 0.87 0.77 1
# class 1 0.00 0.00 0.75 0.00 0.00 0.00 1
# class 2 1.00 0.67 1.00 0.80 0.82 0.64 3
#
# avg / total 0.70 0.60 0.90 0.61 0.66 0.54 5{ .api }
def macro_averaged_mean_absolute_error(
y_true,
y_pred,
*,
sample_weight=None
) -> floatCompute Macro-Averaged MAE for imbalanced ordinal classification.
Parameters:
array-like of shape (n_samples,) or (n_samples, n_outputs)): Ground truth target valuesarray-like of shape (n_samples,) or (n_samples, n_outputs)): Estimated targetsarray-like, optional): Sample weightsReturns:
float or ndarray): Macro-averaged MAE (lower is better)Description: Computes MAE for each class separately and averages them, giving equal weight to each class regardless of class frequency. This provides a more balanced evaluation for imbalanced ordinal classification problems compared to standard MAE.
Example:
from sklearn.metrics import mean_absolute_error
from imblearn.metrics import macro_averaged_mean_absolute_error
y_true_balanced = [1, 1, 2, 2]
y_true_imbalanced = [1, 2, 2, 2]
y_pred = [1, 2, 1, 2]
# Standard MAE
mean_absolute_error(y_true_balanced, y_pred) # 0.5
mean_absolute_error(y_true_imbalanced, y_pred) # 0.25
# Macro-averaged MAE
macro_averaged_mean_absolute_error(y_true_balanced, y_pred) # 0.5
macro_averaged_mean_absolute_error(y_true_imbalanced, y_pred) # 0.16...{ .api }
def make_index_balanced_accuracy(
*,
alpha=0.1,
squared=True
) -> callableFactory function to create Index Balanced Accuracy (IBA) corrected metrics.
Parameters:
float, default=0.1): Weighting factor for dominance correctionbool, default=True): Whether to square the metric before weightingReturns:
callable): Decorator function that applies IBA correction to any scoring metricDescription: The Index Balanced Accuracy corrects standard metrics by accounting for the dominance relationship between sensitivity and specificity. The corrected score is calculated as:
IBA_α(metric) = (1 + α × dominance) × metric^squared
Where dominance = sensitivity - specificity
Mathematical Definition:
Example:
from imblearn.metrics import geometric_mean_score, make_index_balanced_accuracy
# Create IBA-corrected geometric mean
iba_gmean = make_index_balanced_accuracy(alpha=0.1, squared=True)(geometric_mean_score)
y_true = [1, 0, 0, 1, 0, 1]
y_pred = [0, 0, 1, 1, 0, 1]
# Apply IBA correction
iba_scores = iba_gmean(y_true, y_pred, average=None)
print(iba_scores)
# [0.44..., 0.44...]recall_scorebalanced_accuracy_scoreclassification_report_imbalanced extends sklearn's classification_report with:
Binary Classification:
from imblearn.metrics import sensitivity_score, specificity_score, geometric_mean_score
# Individual metrics
sensitivity = sensitivity_score(y_true, y_pred, pos_label=1)
specificity = specificity_score(y_true, y_pred, pos_label=1)
gmean = geometric_mean_score(y_true, y_pred, average='binary')Multi-class Classification:
# Per-class metrics
sen_per_class = sensitivity_score(y_true, y_pred, average=None)
spe_per_class = specificity_score(y_true, y_pred, average=None)
# Averaged metrics
sen_macro = sensitivity_score(y_true, y_pred, average='macro')
gmean_multiclass = geometric_mean_score(y_true, y_pred, average='multiclass')Comprehensive Evaluation:
# Complete imbalanced classification report
report = classification_report_imbalanced(y_true, y_pred, target_names=class_names)
print(report)
# Or as dictionary for programmatic access
report_dict = classification_report_imbalanced(
y_true, y_pred, target_names=class_names, output_dict=True
)'macro' for equal class importance, 'weighted' for frequency-weighted importancecorrection parameter in geometric_mean_score for highly imbalanced datasetsclassification_report_imbalanced for comprehensive evaluationmake_index_balanced_accuracy to correct for class dominance effectsmacro_averaged_mean_absolute_error for imbalanced ordinal classificationInstall with Tessl CLI
npx tessl i tessl/pypi-imbalanced-learn