or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

adversarial.mdassessment.mddatasets.mdindex.mdpostprocessing.mdpreprocessing.mdreductions.md

assessment.mddocs/

0

# Fairness Assessment

1

2

Comprehensive tools for measuring fairness through disaggregated metrics across sensitive groups. The assessment module provides the MetricFrame class for computing metrics across subgroups and specialized fairness functions for measuring specific fairness criteria.

3

4

## Capabilities

5

6

### MetricFrame

7

8

The central class for fairness assessment that computes metrics across subgroups defined by sensitive features. Provides disaggregated views of any metric function and supports comparison methods for fairness evaluation.

9

10

```python { .api }

11

class MetricFrame:

12

def __init__(self, *, metrics, y_true, y_pred,

13

sensitive_features, control_features=None,

14

sample_params=None, n_boot=None, ci_quantiles=None,

15

random_state=None):

16

"""

17

Collection of disaggregated metric values.

18

19

Parameters:

20

- metrics: callable or dict, metric functions to compute

21

- y_true: array-like, true target values

22

- y_pred: array-like, predicted values

23

- sensitive_features: array-like, sensitive feature values for grouping

24

- control_features: array-like, optional control feature values

25

- sample_params: dict, optional parameters for metric functions

26

- n_boot: int, number of bootstrap samples for confidence intervals

27

- ci_quantiles: list[float], quantiles for confidence intervals

28

- random_state: int or RandomState, controls bootstrap sample generation

29

"""

30

31

@property

32

def overall(self):

33

"""Overall metrics computed on entire dataset."""

34

35

@property

36

def by_group(self):

37

"""Metrics computed for each sensitive feature group."""

38

39

def group_max(self):

40

"""Maximum metric value across groups."""

41

42

def group_min(self):

43

"""Minimum metric value across groups."""

44

45

def difference(self, method="between_groups"):

46

"""Difference between group metrics."""

47

48

def ratio(self, method="between_groups"):

49

"""Ratio between group metrics."""

50

51

@property

52

def overall_ci(self):

53

"""Confidence intervals for overall metrics."""

54

55

@property

56

def by_group_ci(self):

57

"""Confidence intervals for group metrics."""

58

59

def group_max_ci(self):

60

"""Confidence intervals for maximum metric values."""

61

62

def group_min_ci(self):

63

"""Confidence intervals for minimum metric values."""

64

65

def difference_ci(self, method="between_groups"):

66

"""Confidence intervals for differences between groups."""

67

68

def ratio_ci(self, method="between_groups"):

69

"""Confidence intervals for ratios between groups."""

70

```

71

72

#### Usage Example

73

74

```python

75

from fairlearn.metrics import MetricFrame

76

from sklearn.metrics import accuracy_score, precision_score

77

78

# Define multiple metrics

79

metrics = {

80

'accuracy': accuracy_score,

81

'precision': precision_score

82

}

83

84

# Create MetricFrame

85

mf = MetricFrame(

86

metrics=metrics,

87

y_true=y_test,

88

y_pred=y_pred,

89

sensitive_features=sensitive_features

90

)

91

92

# Access results

93

print(mf.overall) # Overall metrics

94

print(mf.by_group) # Metrics by group

95

print(mf.difference()) # Differences between groups

96

print(mf.ratio()) # Ratios between groups

97

```

98

99

### Demographic Parity Metrics

100

101

Functions for measuring demographic parity, which requires equal positive prediction rates across groups.

102

103

```python { .api }

104

def demographic_parity_difference(y_true, y_pred, *, sensitive_features,

105

method="between_groups", sample_weight=None):

106

"""

107

Calculate difference in selection rates between groups.

108

109

Parameters:

110

- y_true: array-like, true target values (ignored for selection rate)

111

- y_pred: array-like, predicted values (binary)

112

- sensitive_features: array-like, sensitive feature values

113

- method: str, comparison method ("between_groups" or "to_overall")

114

- sample_weight: array-like, optional sample weights

115

116

Returns:

117

float: Maximum difference in selection rates between any two groups

118

"""

119

120

def demographic_parity_ratio(y_true, y_pred, *, sensitive_features,

121

method="between_groups", sample_weight=None):

122

"""

123

Calculate ratio of selection rates between groups.

124

125

Parameters:

126

- y_true: array-like, true target values (ignored for selection rate)

127

- y_pred: array-like, predicted values (binary)

128

- sensitive_features: array-like, sensitive feature values

129

- method: str, comparison method ("between_groups" or "to_overall")

130

- sample_weight: array-like, optional sample weights

131

132

Returns:

133

float: Minimum ratio of selection rates between any two groups

134

"""

135

```

136

137

### Equalized Odds Metrics

138

139

Functions for measuring equalized odds, which requires equal true positive and false positive rates across groups.

140

141

```python { .api }

142

def equalized_odds_difference(y_true, y_pred, *, sensitive_features,

143

method="between_groups", sample_weight=None,

144

agg="worst_case"):

145

"""

146

Calculate maximum difference in true positive and false positive rates.

147

148

Parameters:

149

- y_true: array-like, true target values (binary)

150

- y_pred: array-like, predicted values (binary)

151

- sensitive_features: array-like, sensitive feature values

152

- method: str, comparison method ("between_groups" or "to_overall")

153

- sample_weight: array-like, optional sample weights

154

- agg: str, aggregation method ("worst_case" or "mean")

155

156

Returns:

157

float: Maximum difference in TPR and FPR between any two groups

158

"""

159

160

def equalized_odds_ratio(y_true, y_pred, *, sensitive_features,

161

method="between_groups", sample_weight=None,

162

agg="worst_case"):

163

"""

164

Calculate minimum ratio in true positive and false positive rates.

165

166

Parameters:

167

- y_true: array-like, true target values (binary)

168

- y_pred: array-like, predicted values (binary)

169

- sensitive_features: array-like, sensitive feature values

170

- method: str, comparison method ("between_groups" or "to_overall")

171

- sample_weight: array-like, optional sample weights

172

- agg: str, aggregation method ("worst_case" or "mean")

173

174

Returns:

175

float: Minimum ratio in TPR and FPR between any two groups

176

"""

177

```

178

179

### Equal Opportunity Metrics

180

181

Functions for measuring equal opportunity, which requires equal true positive rates across groups.

182

183

```python { .api }

184

def equal_opportunity_difference(y_true, y_pred, *, sensitive_features,

185

method="between_groups", sample_weight=None):

186

"""

187

Calculate difference in true positive rates between groups.

188

189

Parameters:

190

- y_true: array-like, true target values (binary)

191

- y_pred: array-like, predicted values (binary)

192

- sensitive_features: array-like, sensitive feature values

193

- method: str, comparison method ("between_groups" or "to_overall")

194

- sample_weight: array-like, optional sample weights

195

196

Returns:

197

float: Maximum difference in TPR between any two groups

198

"""

199

200

def equal_opportunity_ratio(y_true, y_pred, *, sensitive_features,

201

method="between_groups", sample_weight=None):

202

"""

203

Calculate ratio of true positive rates between groups.

204

205

Parameters:

206

- y_true: array-like, true target values (binary)

207

- y_pred: array-like, predicted values (binary)

208

- sensitive_features: array-like, sensitive feature values

209

- method: str, comparison method ("between_groups" or "to_overall")

210

- sample_weight: array-like, optional sample weights

211

212

Returns:

213

float: Minimum ratio in TPR between any two groups

214

"""

215

```

216

217

### Base Metrics

218

219

Fundamental metric functions that can be used with MetricFrame or independently.

220

221

```python { .api }

222

def true_positive_rate(y_true, y_pred, *, sample_weight=None, pos_label=1):

223

"""

224

Calculate true positive rate (sensitivity/recall).

225

226

Parameters:

227

- y_true: array-like, true target values

228

- y_pred: array-like, predicted values

229

- sample_weight: array-like, optional sample weights

230

- pos_label: label considered as positive

231

232

Returns:

233

float: True positive rate

234

"""

235

236

def false_positive_rate(y_true, y_pred, *, sample_weight=None, pos_label=1):

237

"""Calculate false positive rate."""

238

239

def true_negative_rate(y_true, y_pred, *, sample_weight=None, pos_label=1):

240

"""Calculate true negative rate (specificity)."""

241

242

def false_negative_rate(y_true, y_pred, *, sample_weight=None, pos_label=1):

243

"""Calculate false negative rate."""

244

245

def selection_rate(y_true, y_pred, *, sample_weight=None, pos_label=1):

246

"""Calculate selection rate (positive prediction rate)."""

247

248

def mean_prediction(y_true, y_pred, *, sample_weight=None):

249

"""Calculate mean of predictions."""

250

251

def count(y_true, y_pred, *, sample_weight=None):

252

"""Count number of samples."""

253

```

254

255

### Derived Metrics

256

257

Create new fairness metrics from existing metric functions.

258

259

```python { .api }

260

def make_derived_metric(*, metric, transform, sample_weight_names=None):

261

"""

262

Create a derived metric with specified aggregation method.

263

264

Parameters:

265

- metric: callable, base metric function

266

- transform: str, aggregation method ('difference', 'ratio', 'group_min', 'group_max')

267

- sample_weight_names: list, parameter names for sample weights

268

269

Returns:

270

callable: New derived metric function

271

"""

272

```

273

274

### Visualization

275

276

Plotting functions for visualizing model comparison across fairness metrics.

277

278

```python { .api }

279

def plot_model_comparison(dashboard_predicted, *,

280

sensitive_features,

281

conf_intervals=False):

282

"""

283

Plot radar chart comparing multiple models across fairness and performance metrics.

284

285

Parameters:

286

- dashboard_predicted: dict, mapping of model names to prediction dictionaries

287

- sensitive_features: array-like, sensitive feature values

288

- conf_intervals: bool, whether to show confidence intervals

289

290

Returns:

291

matplotlib figure object

292

"""

293

```

294

295

## Generated Metrics

296

297

The metrics module dynamically generates additional fairness metrics for many base metrics using the pattern `<metric>_{difference,ratio,group_min,group_max}`. For example:

298

299

- `accuracy_score_difference`

300

- `precision_score_ratio`

301

- `recall_score_group_min`

302

- `f1_score_group_max`

303

304

These generated metrics provide convenient access to common fairness assessments without manually using `make_derived_metric`.