or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

combination.mddatasets.mddeep-learning.mdensemble.mdindex.mdmetrics.mdmodel-selection.mdover-sampling.mdpipeline.mdunder-sampling.mdutilities.md

metrics.mddocs/

0

# Metrics

1

2

Specialized metrics for evaluating classification and regression performance on imbalanced datasets, extending scikit-learn's standard metrics with measures designed for class imbalance scenarios.

3

4

## Overview

5

6

Imbalanced-learn provides specialized metrics that are particularly relevant for evaluating model performance on imbalanced datasets. These metrics complement scikit-learn's standard metrics by focusing on measures that better capture performance across minority and majority classes.

7

8

### Key Features

9

- **Class-balanced evaluation**: Metrics that give equal importance to all classes regardless of their frequency

10

- **Sensitivity and specificity**: Medical/diagnostic-inspired metrics for binary classification

11

- **Geometric mean**: Root of the product of class-wise sensitivities

12

- **Index balanced accuracy**: Corrected metrics accounting for class dominance

13

- **Comprehensive reporting**: Extended classification reports with imbalanced-specific metrics

14

15

### Metric Categories

16

17

**Individual Classification Metrics**

18

- `sensitivity_score`: True positive rate (recall)

19

- `specificity_score`: True negative rate

20

- `geometric_mean_score`: Balanced accuracy measure

21

22

**Composite Classification Metrics**

23

- `sensitivity_specificity_support`: Combined sensitivity, specificity, and support

24

- `classification_report_imbalanced`: Comprehensive imbalanced classification report

25

26

**Regression Metrics**

27

- `macro_averaged_mean_absolute_error`: Class-balanced MAE for ordinal classification

28

29

**Meta-Functions**

30

- `make_index_balanced_accuracy`: Decorator for correcting metrics with dominance factor

31

32

## Classification Metrics

33

34

### Individual Metrics

35

36

#### sensitivity_score

37

38

```python

39

{ .api }

40

def sensitivity_score(

41

y_true,

42

y_pred,

43

*,

44

labels=None,

45

pos_label=1,

46

average="binary",

47

sample_weight=None

48

) -> float | ndarray

49

```

50

51

Compute the sensitivity (true positive rate).

52

53

**Parameters:**

54

- **y_true** (`array-like` of shape `(n_samples,)`): Ground truth target values

55

- **y_pred** (`array-like` of shape `(n_samples,)`): Estimated targets from classifier

56

- **labels** (`array-like`, optional): Set of labels to include when `average != 'binary'`

57

- **pos_label** (`str`, `int`, or `None`, default=`1`): Class to report for binary classification

58

- **average** (`str`, default=`"binary"`): Averaging strategy - `'binary'`, `'micro'`, `'macro'`, `'weighted'`, `'samples'`, or `None`

59

- **sample_weight** (`array-like`, optional): Sample weights

60

61

**Returns:**

62

- **sensitivity** (`float` or `ndarray`): Sensitivity score(s)

63

64

**Mathematical Definition:**

65

Sensitivity = TP / (TP + FN)

66

67

Where TP is true positives and FN is false negatives. Sensitivity quantifies the ability to avoid false negatives.

68

69

**Example:**

70

```python

71

from imblearn.metrics import sensitivity_score

72

73

y_true = [0, 1, 2, 0, 1, 2]

74

y_pred = [0, 2, 1, 0, 0, 1]

75

76

# Macro-averaged sensitivity

77

sensitivity_score(y_true, y_pred, average='macro')

78

# 0.33...

79

80

# Per-class sensitivity

81

sensitivity_score(y_true, y_pred, average=None)

82

# array([1., 0., 0.])

83

```

84

85

#### specificity_score

86

87

```python

88

{ .api }

89

def specificity_score(

90

y_true,

91

y_pred,

92

*,

93

labels=None,

94

pos_label=1,

95

average="binary",

96

sample_weight=None

97

) -> float | ndarray

98

```

99

100

Compute the specificity (true negative rate).

101

102

**Parameters:**

103

- **y_true** (`array-like` of shape `(n_samples,)`): Ground truth target values

104

- **y_pred** (`array-like` of shape `(n_samples,)`): Estimated targets from classifier

105

- **labels** (`array-like`, optional): Set of labels to include when `average != 'binary'`

106

- **pos_label** (`str`, `int`, or `None`, default=`1`): Class to report for binary classification

107

- **average** (`str`, default=`"binary"`): Averaging strategy - `'binary'`, `'micro'`, `'macro'`, `'weighted'`, `'samples'`, or `None`

108

- **sample_weight** (`array-like`, optional): Sample weights

109

110

**Returns:**

111

- **specificity** (`float` or `ndarray`): Specificity score(s)

112

113

**Mathematical Definition:**

114

Specificity = TN / (TN + FP)

115

116

Where TN is true negatives and FP is false positives. Specificity quantifies the ability to avoid false positives.

117

118

**Example:**

119

```python

120

from imblearn.metrics import specificity_score

121

122

y_true = [0, 1, 2, 0, 1, 2]

123

y_pred = [0, 2, 1, 0, 0, 1]

124

125

# Macro-averaged specificity

126

specificity_score(y_true, y_pred, average='macro')

127

# 0.66...

128

129

# Per-class specificity

130

specificity_score(y_true, y_pred, average=None)

131

# array([0.75, 0.5, 0.75])

132

```

133

134

#### geometric_mean_score

135

136

```python

137

{ .api }

138

def geometric_mean_score(

139

y_true,

140

y_pred,

141

*,

142

labels=None,

143

pos_label=1,

144

average="multiclass",

145

sample_weight=None,

146

correction=0.0

147

) -> float

148

```

149

150

Compute the geometric mean of class-wise sensitivities.

151

152

**Parameters:**

153

- **y_true** (`array-like` of shape `(n_samples,)`): Ground truth target values

154

- **y_pred** (`array-like` of shape `(n_samples,)`): Estimated targets from classifier

155

- **labels** (`array-like`, optional): Set of labels to include

156

- **pos_label** (`str`, `int`, or `None`, default=`1`): Class to report for binary classification

157

- **average** (`str`, default=`"multiclass"`): Averaging strategy - `'binary'`, `'micro'`, `'macro'`, `'weighted'`, `'multiclass'`, `'samples'`, or `None`

158

- **sample_weight** (`array-like`, optional): Sample weights

159

- **correction** (`float`, default=`0.0`): Substitute for zero sensitivities to avoid zero G-mean

160

161

**Returns:**

162

- **geometric_mean** (`float`): Geometric mean score

163

164

**Mathematical Definition:**

165

- **Binary classification**: G-mean = √(Sensitivity × Specificity)

166

- **Multi-class classification**: G-mean = ⁿ√(∏ᵢ₌₁ⁿ Sensitivityᵢ)

167

168

The geometric mean tries to maximize accuracy on each class while keeping accuracies balanced. If any class has zero sensitivity, G-mean becomes zero unless corrected.

169

170

**Example:**

171

```python

172

from imblearn.metrics import geometric_mean_score

173

174

y_true = [0, 1, 2, 0, 1, 2]

175

y_pred = [0, 2, 1, 0, 0, 1]

176

177

# Multi-class geometric mean

178

geometric_mean_score(y_true, y_pred)

179

# 0.0

180

181

# With correction for unrecognized classes

182

geometric_mean_score(y_true, y_pred, correction=0.001)

183

# 0.010...

184

185

# Macro-averaged (one-vs-rest)

186

geometric_mean_score(y_true, y_pred, average='macro')

187

# 0.471...

188

```

189

190

### Composite Metrics

191

192

#### sensitivity_specificity_support

193

194

```python

195

{ .api }

196

def sensitivity_specificity_support(

197

y_true,

198

y_pred,

199

*,

200

labels=None,

201

pos_label=1,

202

average=None,

203

warn_for=("sensitivity", "specificity"),

204

sample_weight=None

205

) -> tuple[float | ndarray, float | ndarray, int | ndarray | None]

206

```

207

208

Compute sensitivity, specificity, and support for each class.

209

210

**Parameters:**

211

- **y_true** (`array-like` of shape `(n_samples,)`): Ground truth target values

212

- **y_pred** (`array-like` of shape `(n_samples,)`): Estimated targets from classifier

213

- **labels** (`array-like`, optional): Set of labels to include when `average != 'binary'`

214

- **pos_label** (`str`, `int`, or `None`, default=`1`): Class to report for binary classification

215

- **average** (`str`, optional): Averaging strategy - `'binary'`, `'micro'`, `'macro'`, `'weighted'`, `'samples'`, or `None`

216

- **warn_for** (`tuple`, default=`("sensitivity", "specificity")`): Metrics to warn about

217

- **sample_weight** (`array-like`, optional): Sample weights

218

219

**Returns:**

220

- **sensitivity** (`float` or `ndarray`): Sensitivity metric(s)

221

- **specificity** (`float` or `ndarray`): Specificity metric(s)

222

- **support** (`int`, `ndarray`, or `None`): Number of occurrences of each label

223

224

**Example:**

225

```python

226

from imblearn.metrics import sensitivity_specificity_support

227

228

y_true = ['cat', 'dog', 'pig', 'cat', 'dog', 'pig']

229

y_pred = ['cat', 'pig', 'dog', 'cat', 'cat', 'dog']

230

231

# Macro-averaged metrics

232

sensitivity_specificity_support(y_true, y_pred, average='macro')

233

# (0.33..., 0.66..., None)

234

235

# Per-class metrics

236

sen, spe, sup = sensitivity_specificity_support(y_true, y_pred, average=None)

237

print(f"Sensitivity: {sen}")

238

print(f"Specificity: {spe}")

239

print(f"Support: {sup}")

240

```

241

242

#### classification_report_imbalanced

243

244

```python

245

{ .api }

246

def classification_report_imbalanced(

247

y_true,

248

y_pred,

249

*,

250

labels=None,

251

target_names=None,

252

sample_weight=None,

253

digits=2,

254

alpha=0.1,

255

output_dict=False,

256

zero_division="warn"

257

) -> str | dict

258

```

259

260

Build a comprehensive classification report for imbalanced datasets.

261

262

**Parameters:**

263

- **y_true** (`array-like`): Ground truth target values

264

- **y_pred** (`array-like`): Estimated targets from classifier

265

- **labels** (`array-like`, optional): Label indices to include in report

266

- **target_names** (`list` of `str`, optional): Display names for labels

267

- **sample_weight** (`array-like`, optional): Sample weights

268

- **digits** (`int`, default=`2`): Number of digits for formatting floating point values

269

- **alpha** (`float`, default=`0.1`): Weighting factor for index balanced accuracy

270

- **output_dict** (`bool`, default=`False`): Return output as dictionary

271

- **zero_division** (`"warn"` or `{0, 1}`, default=`"warn"`): Value for zero division cases

272

273

**Returns:**

274

- **report** (`str` or `dict`): Classification report with precision, recall, specificity, f1, geometric mean, and index balanced accuracy

275

276

**Metrics Included:**

277

- **pre**: Precision

278

- **rec**: Recall (Sensitivity)

279

- **spe**: Specificity

280

- **f1**: F1-score

281

- **geo**: Geometric mean

282

- **iba**: Index balanced accuracy

283

- **sup**: Support

284

285

**Example:**

286

```python

287

from imblearn.metrics import classification_report_imbalanced

288

289

y_true = [0, 1, 2, 2, 2]

290

y_pred = [0, 0, 2, 2, 1]

291

target_names = ['class 0', 'class 1', 'class 2']

292

293

print(classification_report_imbalanced(

294

y_true, y_pred, target_names=target_names

295

))

296

297

# pre rec spe f1 geo iba sup

298

#

299

# class 0 0.50 1.00 0.75 0.67 0.87 0.77 1

300

# class 1 0.00 0.00 0.75 0.00 0.00 0.00 1

301

# class 2 1.00 0.67 1.00 0.80 0.82 0.64 3

302

#

303

# avg / total 0.70 0.60 0.90 0.61 0.66 0.54 5

304

```

305

306

## Regression Metrics

307

308

#### macro_averaged_mean_absolute_error

309

310

```python

311

{ .api }

312

def macro_averaged_mean_absolute_error(

313

y_true,

314

y_pred,

315

*,

316

sample_weight=None

317

) -> float

318

```

319

320

Compute Macro-Averaged MAE for imbalanced ordinal classification.

321

322

**Parameters:**

323

- **y_true** (`array-like` of shape `(n_samples,)` or `(n_samples, n_outputs)`): Ground truth target values

324

- **y_pred** (`array-like` of shape `(n_samples,)` or `(n_samples, n_outputs)`): Estimated targets

325

- **sample_weight** (`array-like`, optional): Sample weights

326

327

**Returns:**

328

- **loss** (`float` or `ndarray`): Macro-averaged MAE (lower is better)

329

330

**Description:**

331

Computes MAE for each class separately and averages them, giving equal weight to each class regardless of class frequency. This provides a more balanced evaluation for imbalanced ordinal classification problems compared to standard MAE.

332

333

**Example:**

334

```python

335

from sklearn.metrics import mean_absolute_error

336

from imblearn.metrics import macro_averaged_mean_absolute_error

337

338

y_true_balanced = [1, 1, 2, 2]

339

y_true_imbalanced = [1, 2, 2, 2]

340

y_pred = [1, 2, 1, 2]

341

342

# Standard MAE

343

mean_absolute_error(y_true_balanced, y_pred) # 0.5

344

mean_absolute_error(y_true_imbalanced, y_pred) # 0.25

345

346

# Macro-averaged MAE

347

macro_averaged_mean_absolute_error(y_true_balanced, y_pred) # 0.5

348

macro_averaged_mean_absolute_error(y_true_imbalanced, y_pred) # 0.16...

349

```

350

351

## Meta-Functions

352

353

#### make_index_balanced_accuracy

354

355

```python

356

{ .api }

357

def make_index_balanced_accuracy(

358

*,

359

alpha=0.1,

360

squared=True

361

) -> callable

362

```

363

364

Factory function to create Index Balanced Accuracy (IBA) corrected metrics.

365

366

**Parameters:**

367

- **alpha** (`float`, default=`0.1`): Weighting factor for dominance correction

368

- **squared** (`bool`, default=`True`): Whether to square the metric before weighting

369

370

**Returns:**

371

- **iba_scoring_func** (`callable`): Decorator function that applies IBA correction to any scoring metric

372

373

**Description:**

374

The Index Balanced Accuracy corrects standard metrics by accounting for the dominance relationship between sensitivity and specificity. The corrected score is calculated as:

375

376

IBA_α(metric) = (1 + α × dominance) × metric^squared

377

378

Where dominance = sensitivity - specificity

379

380

**Mathematical Definition:**

381

- **Dominance**: D = Sensitivity - Specificity

382

- **IBA correction**: IBA_α(M) = (1 + α × D) × M² (if squared=True)

383

384

**Example:**

385

```python

386

from imblearn.metrics import geometric_mean_score, make_index_balanced_accuracy

387

388

# Create IBA-corrected geometric mean

389

iba_gmean = make_index_balanced_accuracy(alpha=0.1, squared=True)(geometric_mean_score)

390

391

y_true = [1, 0, 0, 1, 0, 1]

392

y_pred = [0, 0, 1, 1, 0, 1]

393

394

# Apply IBA correction

395

iba_scores = iba_gmean(y_true, y_pred, average=None)

396

print(iba_scores)

397

# [0.44..., 0.44...]

398

```

399

400

## Relationship to scikit-learn Metrics

401

402

### Complementary Metrics

403

- **Sensitivity** is equivalent to sklearn's `recall_score`

404

- **Specificity** has no direct sklearn equivalent (inverse of false positive rate)

405

- **Geometric mean** provides balanced accuracy alternative to sklearn's `balanced_accuracy_score`

406

407

### Enhanced Reports

408

- `classification_report_imbalanced` extends sklearn's `classification_report` with:

409

- Specificity scores

410

- Geometric mean scores

411

- Index balanced accuracy scores

412

- Better handling of imbalanced data

413

414

### Usage Patterns

415

416

**Binary Classification:**

417

```python

418

from imblearn.metrics import sensitivity_score, specificity_score, geometric_mean_score

419

420

# Individual metrics

421

sensitivity = sensitivity_score(y_true, y_pred, pos_label=1)

422

specificity = specificity_score(y_true, y_pred, pos_label=1)

423

gmean = geometric_mean_score(y_true, y_pred, average='binary')

424

```

425

426

**Multi-class Classification:**

427

```python

428

# Per-class metrics

429

sen_per_class = sensitivity_score(y_true, y_pred, average=None)

430

spe_per_class = specificity_score(y_true, y_pred, average=None)

431

432

# Averaged metrics

433

sen_macro = sensitivity_score(y_true, y_pred, average='macro')

434

gmean_multiclass = geometric_mean_score(y_true, y_pred, average='multiclass')

435

```

436

437

**Comprehensive Evaluation:**

438

```python

439

# Complete imbalanced classification report

440

report = classification_report_imbalanced(y_true, y_pred, target_names=class_names)

441

print(report)

442

443

# Or as dictionary for programmatic access

444

report_dict = classification_report_imbalanced(

445

y_true, y_pred, target_names=class_names, output_dict=True

446

)

447

```

448

449

## Best Practices

450

451

1. **Choose appropriate averaging**: Use `'macro'` for equal class importance, `'weighted'` for frequency-weighted importance

452

2. **Handle zero classes**: Use `correction` parameter in `geometric_mean_score` for highly imbalanced datasets

453

3. **Combine metrics**: Use `classification_report_imbalanced` for comprehensive evaluation

454

4. **Apply IBA correction**: Use `make_index_balanced_accuracy` to correct for class dominance effects

455

5. **Consider ordinal data**: Use `macro_averaged_mean_absolute_error` for imbalanced ordinal classification