0
# Metrics
1
2
Specialized metrics for evaluating classification and regression performance on imbalanced datasets, extending scikit-learn's standard metrics with measures designed for class imbalance scenarios.
3
4
## Overview
5
6
Imbalanced-learn provides specialized metrics that are particularly relevant for evaluating model performance on imbalanced datasets. These metrics complement scikit-learn's standard metrics by focusing on measures that better capture performance across minority and majority classes.
7
8
### Key Features
9
- **Class-balanced evaluation**: Metrics that give equal importance to all classes regardless of their frequency
10
- **Sensitivity and specificity**: Medical/diagnostic-inspired metrics for binary classification
11
- **Geometric mean**: Root of the product of class-wise sensitivities
12
- **Index balanced accuracy**: Corrected metrics accounting for class dominance
13
- **Comprehensive reporting**: Extended classification reports with imbalanced-specific metrics
14
15
### Metric Categories
16
17
**Individual Classification Metrics**
18
- `sensitivity_score`: True positive rate (recall)
19
- `specificity_score`: True negative rate
20
- `geometric_mean_score`: Balanced accuracy measure
21
22
**Composite Classification Metrics**
23
- `sensitivity_specificity_support`: Combined sensitivity, specificity, and support
24
- `classification_report_imbalanced`: Comprehensive imbalanced classification report
25
26
**Regression Metrics**
27
- `macro_averaged_mean_absolute_error`: Class-balanced MAE for ordinal classification
28
29
**Meta-Functions**
30
- `make_index_balanced_accuracy`: Decorator for correcting metrics with dominance factor
31
32
## Classification Metrics
33
34
### Individual Metrics
35
36
#### sensitivity_score
37
38
```python
39
{ .api }
40
def sensitivity_score(
41
y_true,
42
y_pred,
43
*,
44
labels=None,
45
pos_label=1,
46
average="binary",
47
sample_weight=None
48
) -> float | ndarray
49
```
50
51
Compute the sensitivity (true positive rate).
52
53
**Parameters:**
54
- **y_true** (`array-like` of shape `(n_samples,)`): Ground truth target values
55
- **y_pred** (`array-like` of shape `(n_samples,)`): Estimated targets from classifier
56
- **labels** (`array-like`, optional): Set of labels to include when `average != 'binary'`
57
- **pos_label** (`str`, `int`, or `None`, default=`1`): Class to report for binary classification
58
- **average** (`str`, default=`"binary"`): Averaging strategy - `'binary'`, `'micro'`, `'macro'`, `'weighted'`, `'samples'`, or `None`
59
- **sample_weight** (`array-like`, optional): Sample weights
60
61
**Returns:**
62
- **sensitivity** (`float` or `ndarray`): Sensitivity score(s)
63
64
**Mathematical Definition:**
65
Sensitivity = TP / (TP + FN)
66
67
Where TP is true positives and FN is false negatives. Sensitivity quantifies the ability to avoid false negatives.
68
69
**Example:**
70
```python
71
from imblearn.metrics import sensitivity_score
72
73
y_true = [0, 1, 2, 0, 1, 2]
74
y_pred = [0, 2, 1, 0, 0, 1]
75
76
# Macro-averaged sensitivity
77
sensitivity_score(y_true, y_pred, average='macro')
78
# 0.33...
79
80
# Per-class sensitivity
81
sensitivity_score(y_true, y_pred, average=None)
82
# array([1., 0., 0.])
83
```
84
85
#### specificity_score
86
87
```python
88
{ .api }
89
def specificity_score(
90
y_true,
91
y_pred,
92
*,
93
labels=None,
94
pos_label=1,
95
average="binary",
96
sample_weight=None
97
) -> float | ndarray
98
```
99
100
Compute the specificity (true negative rate).
101
102
**Parameters:**
103
- **y_true** (`array-like` of shape `(n_samples,)`): Ground truth target values
104
- **y_pred** (`array-like` of shape `(n_samples,)`): Estimated targets from classifier
105
- **labels** (`array-like`, optional): Set of labels to include when `average != 'binary'`
106
- **pos_label** (`str`, `int`, or `None`, default=`1`): Class to report for binary classification
107
- **average** (`str`, default=`"binary"`): Averaging strategy - `'binary'`, `'micro'`, `'macro'`, `'weighted'`, `'samples'`, or `None`
108
- **sample_weight** (`array-like`, optional): Sample weights
109
110
**Returns:**
111
- **specificity** (`float` or `ndarray`): Specificity score(s)
112
113
**Mathematical Definition:**
114
Specificity = TN / (TN + FP)
115
116
Where TN is true negatives and FP is false positives. Specificity quantifies the ability to avoid false positives.
117
118
**Example:**
119
```python
120
from imblearn.metrics import specificity_score
121
122
y_true = [0, 1, 2, 0, 1, 2]
123
y_pred = [0, 2, 1, 0, 0, 1]
124
125
# Macro-averaged specificity
126
specificity_score(y_true, y_pred, average='macro')
127
# 0.66...
128
129
# Per-class specificity
130
specificity_score(y_true, y_pred, average=None)
131
# array([0.75, 0.5, 0.75])
132
```
133
134
#### geometric_mean_score
135
136
```python
137
{ .api }
138
def geometric_mean_score(
139
y_true,
140
y_pred,
141
*,
142
labels=None,
143
pos_label=1,
144
average="multiclass",
145
sample_weight=None,
146
correction=0.0
147
) -> float
148
```
149
150
Compute the geometric mean of class-wise sensitivities.
151
152
**Parameters:**
153
- **y_true** (`array-like` of shape `(n_samples,)`): Ground truth target values
154
- **y_pred** (`array-like` of shape `(n_samples,)`): Estimated targets from classifier
155
- **labels** (`array-like`, optional): Set of labels to include
156
- **pos_label** (`str`, `int`, or `None`, default=`1`): Class to report for binary classification
157
- **average** (`str`, default=`"multiclass"`): Averaging strategy - `'binary'`, `'micro'`, `'macro'`, `'weighted'`, `'multiclass'`, `'samples'`, or `None`
158
- **sample_weight** (`array-like`, optional): Sample weights
159
- **correction** (`float`, default=`0.0`): Substitute for zero sensitivities to avoid zero G-mean
160
161
**Returns:**
162
- **geometric_mean** (`float`): Geometric mean score
163
164
**Mathematical Definition:**
165
- **Binary classification**: G-mean = √(Sensitivity × Specificity)
166
- **Multi-class classification**: G-mean = ⁿ√(∏ᵢ₌₁ⁿ Sensitivityᵢ)
167
168
The geometric mean tries to maximize accuracy on each class while keeping accuracies balanced. If any class has zero sensitivity, G-mean becomes zero unless corrected.
169
170
**Example:**
171
```python
172
from imblearn.metrics import geometric_mean_score
173
174
y_true = [0, 1, 2, 0, 1, 2]
175
y_pred = [0, 2, 1, 0, 0, 1]
176
177
# Multi-class geometric mean
178
geometric_mean_score(y_true, y_pred)
179
# 0.0
180
181
# With correction for unrecognized classes
182
geometric_mean_score(y_true, y_pred, correction=0.001)
183
# 0.010...
184
185
# Macro-averaged (one-vs-rest)
186
geometric_mean_score(y_true, y_pred, average='macro')
187
# 0.471...
188
```
189
190
### Composite Metrics
191
192
#### sensitivity_specificity_support
193
194
```python
195
{ .api }
196
def sensitivity_specificity_support(
197
y_true,
198
y_pred,
199
*,
200
labels=None,
201
pos_label=1,
202
average=None,
203
warn_for=("sensitivity", "specificity"),
204
sample_weight=None
205
) -> tuple[float | ndarray, float | ndarray, int | ndarray | None]
206
```
207
208
Compute sensitivity, specificity, and support for each class.
209
210
**Parameters:**
211
- **y_true** (`array-like` of shape `(n_samples,)`): Ground truth target values
212
- **y_pred** (`array-like` of shape `(n_samples,)`): Estimated targets from classifier
213
- **labels** (`array-like`, optional): Set of labels to include when `average != 'binary'`
214
- **pos_label** (`str`, `int`, or `None`, default=`1`): Class to report for binary classification
215
- **average** (`str`, optional): Averaging strategy - `'binary'`, `'micro'`, `'macro'`, `'weighted'`, `'samples'`, or `None`
216
- **warn_for** (`tuple`, default=`("sensitivity", "specificity")`): Metrics to warn about
217
- **sample_weight** (`array-like`, optional): Sample weights
218
219
**Returns:**
220
- **sensitivity** (`float` or `ndarray`): Sensitivity metric(s)
221
- **specificity** (`float` or `ndarray`): Specificity metric(s)
222
- **support** (`int`, `ndarray`, or `None`): Number of occurrences of each label
223
224
**Example:**
225
```python
226
from imblearn.metrics import sensitivity_specificity_support
227
228
y_true = ['cat', 'dog', 'pig', 'cat', 'dog', 'pig']
229
y_pred = ['cat', 'pig', 'dog', 'cat', 'cat', 'dog']
230
231
# Macro-averaged metrics
232
sensitivity_specificity_support(y_true, y_pred, average='macro')
233
# (0.33..., 0.66..., None)
234
235
# Per-class metrics
236
sen, spe, sup = sensitivity_specificity_support(y_true, y_pred, average=None)
237
print(f"Sensitivity: {sen}")
238
print(f"Specificity: {spe}")
239
print(f"Support: {sup}")
240
```
241
242
#### classification_report_imbalanced
243
244
```python
245
{ .api }
246
def classification_report_imbalanced(
247
y_true,
248
y_pred,
249
*,
250
labels=None,
251
target_names=None,
252
sample_weight=None,
253
digits=2,
254
alpha=0.1,
255
output_dict=False,
256
zero_division="warn"
257
) -> str | dict
258
```
259
260
Build a comprehensive classification report for imbalanced datasets.
261
262
**Parameters:**
263
- **y_true** (`array-like`): Ground truth target values
264
- **y_pred** (`array-like`): Estimated targets from classifier
265
- **labels** (`array-like`, optional): Label indices to include in report
266
- **target_names** (`list` of `str`, optional): Display names for labels
267
- **sample_weight** (`array-like`, optional): Sample weights
268
- **digits** (`int`, default=`2`): Number of digits for formatting floating point values
269
- **alpha** (`float`, default=`0.1`): Weighting factor for index balanced accuracy
270
- **output_dict** (`bool`, default=`False`): Return output as dictionary
271
- **zero_division** (`"warn"` or `{0, 1}`, default=`"warn"`): Value for zero division cases
272
273
**Returns:**
274
- **report** (`str` or `dict`): Classification report with precision, recall, specificity, f1, geometric mean, and index balanced accuracy
275
276
**Metrics Included:**
277
- **pre**: Precision
278
- **rec**: Recall (Sensitivity)
279
- **spe**: Specificity
280
- **f1**: F1-score
281
- **geo**: Geometric mean
282
- **iba**: Index balanced accuracy
283
- **sup**: Support
284
285
**Example:**
286
```python
287
from imblearn.metrics import classification_report_imbalanced
288
289
y_true = [0, 1, 2, 2, 2]
290
y_pred = [0, 0, 2, 2, 1]
291
target_names = ['class 0', 'class 1', 'class 2']
292
293
print(classification_report_imbalanced(
294
y_true, y_pred, target_names=target_names
295
))
296
297
# pre rec spe f1 geo iba sup
298
#
299
# class 0 0.50 1.00 0.75 0.67 0.87 0.77 1
300
# class 1 0.00 0.00 0.75 0.00 0.00 0.00 1
301
# class 2 1.00 0.67 1.00 0.80 0.82 0.64 3
302
#
303
# avg / total 0.70 0.60 0.90 0.61 0.66 0.54 5
304
```
305
306
## Regression Metrics
307
308
#### macro_averaged_mean_absolute_error
309
310
```python
311
{ .api }
312
def macro_averaged_mean_absolute_error(
313
y_true,
314
y_pred,
315
*,
316
sample_weight=None
317
) -> float
318
```
319
320
Compute Macro-Averaged MAE for imbalanced ordinal classification.
321
322
**Parameters:**
323
- **y_true** (`array-like` of shape `(n_samples,)` or `(n_samples, n_outputs)`): Ground truth target values
324
- **y_pred** (`array-like` of shape `(n_samples,)` or `(n_samples, n_outputs)`): Estimated targets
325
- **sample_weight** (`array-like`, optional): Sample weights
326
327
**Returns:**
328
- **loss** (`float` or `ndarray`): Macro-averaged MAE (lower is better)
329
330
**Description:**
331
Computes MAE for each class separately and averages them, giving equal weight to each class regardless of class frequency. This provides a more balanced evaluation for imbalanced ordinal classification problems compared to standard MAE.
332
333
**Example:**
334
```python
335
from sklearn.metrics import mean_absolute_error
336
from imblearn.metrics import macro_averaged_mean_absolute_error
337
338
y_true_balanced = [1, 1, 2, 2]
339
y_true_imbalanced = [1, 2, 2, 2]
340
y_pred = [1, 2, 1, 2]
341
342
# Standard MAE
343
mean_absolute_error(y_true_balanced, y_pred) # 0.5
344
mean_absolute_error(y_true_imbalanced, y_pred) # 0.25
345
346
# Macro-averaged MAE
347
macro_averaged_mean_absolute_error(y_true_balanced, y_pred) # 0.5
348
macro_averaged_mean_absolute_error(y_true_imbalanced, y_pred) # 0.16...
349
```
350
351
## Meta-Functions
352
353
#### make_index_balanced_accuracy
354
355
```python
356
{ .api }
357
def make_index_balanced_accuracy(
358
*,
359
alpha=0.1,
360
squared=True
361
) -> callable
362
```
363
364
Factory function to create Index Balanced Accuracy (IBA) corrected metrics.
365
366
**Parameters:**
367
- **alpha** (`float`, default=`0.1`): Weighting factor for dominance correction
368
- **squared** (`bool`, default=`True`): Whether to square the metric before weighting
369
370
**Returns:**
371
- **iba_scoring_func** (`callable`): Decorator function that applies IBA correction to any scoring metric
372
373
**Description:**
374
The Index Balanced Accuracy corrects standard metrics by accounting for the dominance relationship between sensitivity and specificity. The corrected score is calculated as:
375
376
IBA_α(metric) = (1 + α × dominance) × metric^squared
377
378
Where dominance = sensitivity - specificity
379
380
**Mathematical Definition:**
381
- **Dominance**: D = Sensitivity - Specificity
382
- **IBA correction**: IBA_α(M) = (1 + α × D) × M² (if squared=True)
383
384
**Example:**
385
```python
386
from imblearn.metrics import geometric_mean_score, make_index_balanced_accuracy
387
388
# Create IBA-corrected geometric mean
389
iba_gmean = make_index_balanced_accuracy(alpha=0.1, squared=True)(geometric_mean_score)
390
391
y_true = [1, 0, 0, 1, 0, 1]
392
y_pred = [0, 0, 1, 1, 0, 1]
393
394
# Apply IBA correction
395
iba_scores = iba_gmean(y_true, y_pred, average=None)
396
print(iba_scores)
397
# [0.44..., 0.44...]
398
```
399
400
## Relationship to scikit-learn Metrics
401
402
### Complementary Metrics
403
- **Sensitivity** is equivalent to sklearn's `recall_score`
404
- **Specificity** has no direct sklearn equivalent (inverse of false positive rate)
405
- **Geometric mean** provides balanced accuracy alternative to sklearn's `balanced_accuracy_score`
406
407
### Enhanced Reports
408
- `classification_report_imbalanced` extends sklearn's `classification_report` with:
409
- Specificity scores
410
- Geometric mean scores
411
- Index balanced accuracy scores
412
- Better handling of imbalanced data
413
414
### Usage Patterns
415
416
**Binary Classification:**
417
```python
418
from imblearn.metrics import sensitivity_score, specificity_score, geometric_mean_score
419
420
# Individual metrics
421
sensitivity = sensitivity_score(y_true, y_pred, pos_label=1)
422
specificity = specificity_score(y_true, y_pred, pos_label=1)
423
gmean = geometric_mean_score(y_true, y_pred, average='binary')
424
```
425
426
**Multi-class Classification:**
427
```python
428
# Per-class metrics
429
sen_per_class = sensitivity_score(y_true, y_pred, average=None)
430
spe_per_class = specificity_score(y_true, y_pred, average=None)
431
432
# Averaged metrics
433
sen_macro = sensitivity_score(y_true, y_pred, average='macro')
434
gmean_multiclass = geometric_mean_score(y_true, y_pred, average='multiclass')
435
```
436
437
**Comprehensive Evaluation:**
438
```python
439
# Complete imbalanced classification report
440
report = classification_report_imbalanced(y_true, y_pred, target_names=class_names)
441
print(report)
442
443
# Or as dictionary for programmatic access
444
report_dict = classification_report_imbalanced(
445
y_true, y_pred, target_names=class_names, output_dict=True
446
)
447
```
448
449
## Best Practices
450
451
1. **Choose appropriate averaging**: Use `'macro'` for equal class importance, `'weighted'` for frequency-weighted importance
452
2. **Handle zero classes**: Use `correction` parameter in `geometric_mean_score` for highly imbalanced datasets
453
3. **Combine metrics**: Use `classification_report_imbalanced` for comprehensive evaluation
454
4. **Apply IBA correction**: Use `make_index_balanced_accuracy` to correct for class dominance effects
455
5. **Consider ordinal data**: Use `macro_averaged_mean_absolute_error` for imbalanced ordinal classification