0
# Fairness Assessment
1
2
Comprehensive tools for measuring fairness through disaggregated metrics across sensitive groups. The assessment module provides the MetricFrame class for computing metrics across subgroups and specialized fairness functions for measuring specific fairness criteria.
3
4
## Capabilities
5
6
### MetricFrame
7
8
The central class for fairness assessment that computes metrics across subgroups defined by sensitive features. Provides disaggregated views of any metric function and supports comparison methods for fairness evaluation.
9
10
```python { .api }
11
class MetricFrame:
12
def __init__(self, *, metrics, y_true, y_pred,
13
sensitive_features, control_features=None,
14
sample_params=None, n_boot=None, ci_quantiles=None,
15
random_state=None):
16
"""
17
Collection of disaggregated metric values.
18
19
Parameters:
20
- metrics: callable or dict, metric functions to compute
21
- y_true: array-like, true target values
22
- y_pred: array-like, predicted values
23
- sensitive_features: array-like, sensitive feature values for grouping
24
- control_features: array-like, optional control feature values
25
- sample_params: dict, optional parameters for metric functions
26
- n_boot: int, number of bootstrap samples for confidence intervals
27
- ci_quantiles: list[float], quantiles for confidence intervals
28
- random_state: int or RandomState, controls bootstrap sample generation
29
"""
30
31
@property
32
def overall(self):
33
"""Overall metrics computed on entire dataset."""
34
35
@property
36
def by_group(self):
37
"""Metrics computed for each sensitive feature group."""
38
39
def group_max(self):
40
"""Maximum metric value across groups."""
41
42
def group_min(self):
43
"""Minimum metric value across groups."""
44
45
def difference(self, method="between_groups"):
46
"""Difference between group metrics."""
47
48
def ratio(self, method="between_groups"):
49
"""Ratio between group metrics."""
50
51
@property
52
def overall_ci(self):
53
"""Confidence intervals for overall metrics."""
54
55
@property
56
def by_group_ci(self):
57
"""Confidence intervals for group metrics."""
58
59
def group_max_ci(self):
60
"""Confidence intervals for maximum metric values."""
61
62
def group_min_ci(self):
63
"""Confidence intervals for minimum metric values."""
64
65
def difference_ci(self, method="between_groups"):
66
"""Confidence intervals for differences between groups."""
67
68
def ratio_ci(self, method="between_groups"):
69
"""Confidence intervals for ratios between groups."""
70
```
71
72
#### Usage Example
73
74
```python
75
from fairlearn.metrics import MetricFrame
76
from sklearn.metrics import accuracy_score, precision_score
77
78
# Define multiple metrics
79
metrics = {
80
'accuracy': accuracy_score,
81
'precision': precision_score
82
}
83
84
# Create MetricFrame
85
mf = MetricFrame(
86
metrics=metrics,
87
y_true=y_test,
88
y_pred=y_pred,
89
sensitive_features=sensitive_features
90
)
91
92
# Access results
93
print(mf.overall) # Overall metrics
94
print(mf.by_group) # Metrics by group
95
print(mf.difference()) # Differences between groups
96
print(mf.ratio()) # Ratios between groups
97
```
98
99
### Demographic Parity Metrics
100
101
Functions for measuring demographic parity, which requires equal positive prediction rates across groups.
102
103
```python { .api }
104
def demographic_parity_difference(y_true, y_pred, *, sensitive_features,
105
method="between_groups", sample_weight=None):
106
"""
107
Calculate difference in selection rates between groups.
108
109
Parameters:
110
- y_true: array-like, true target values (ignored for selection rate)
111
- y_pred: array-like, predicted values (binary)
112
- sensitive_features: array-like, sensitive feature values
113
- method: str, comparison method ("between_groups" or "to_overall")
114
- sample_weight: array-like, optional sample weights
115
116
Returns:
117
float: Maximum difference in selection rates between any two groups
118
"""
119
120
def demographic_parity_ratio(y_true, y_pred, *, sensitive_features,
121
method="between_groups", sample_weight=None):
122
"""
123
Calculate ratio of selection rates between groups.
124
125
Parameters:
126
- y_true: array-like, true target values (ignored for selection rate)
127
- y_pred: array-like, predicted values (binary)
128
- sensitive_features: array-like, sensitive feature values
129
- method: str, comparison method ("between_groups" or "to_overall")
130
- sample_weight: array-like, optional sample weights
131
132
Returns:
133
float: Minimum ratio of selection rates between any two groups
134
"""
135
```
136
137
### Equalized Odds Metrics
138
139
Functions for measuring equalized odds, which requires equal true positive and false positive rates across groups.
140
141
```python { .api }
142
def equalized_odds_difference(y_true, y_pred, *, sensitive_features,
143
method="between_groups", sample_weight=None,
144
agg="worst_case"):
145
"""
146
Calculate maximum difference in true positive and false positive rates.
147
148
Parameters:
149
- y_true: array-like, true target values (binary)
150
- y_pred: array-like, predicted values (binary)
151
- sensitive_features: array-like, sensitive feature values
152
- method: str, comparison method ("between_groups" or "to_overall")
153
- sample_weight: array-like, optional sample weights
154
- agg: str, aggregation method ("worst_case" or "mean")
155
156
Returns:
157
float: Maximum difference in TPR and FPR between any two groups
158
"""
159
160
def equalized_odds_ratio(y_true, y_pred, *, sensitive_features,
161
method="between_groups", sample_weight=None,
162
agg="worst_case"):
163
"""
164
Calculate minimum ratio in true positive and false positive rates.
165
166
Parameters:
167
- y_true: array-like, true target values (binary)
168
- y_pred: array-like, predicted values (binary)
169
- sensitive_features: array-like, sensitive feature values
170
- method: str, comparison method ("between_groups" or "to_overall")
171
- sample_weight: array-like, optional sample weights
172
- agg: str, aggregation method ("worst_case" or "mean")
173
174
Returns:
175
float: Minimum ratio in TPR and FPR between any two groups
176
"""
177
```
178
179
### Equal Opportunity Metrics
180
181
Functions for measuring equal opportunity, which requires equal true positive rates across groups.
182
183
```python { .api }
184
def equal_opportunity_difference(y_true, y_pred, *, sensitive_features,
185
method="between_groups", sample_weight=None):
186
"""
187
Calculate difference in true positive rates between groups.
188
189
Parameters:
190
- y_true: array-like, true target values (binary)
191
- y_pred: array-like, predicted values (binary)
192
- sensitive_features: array-like, sensitive feature values
193
- method: str, comparison method ("between_groups" or "to_overall")
194
- sample_weight: array-like, optional sample weights
195
196
Returns:
197
float: Maximum difference in TPR between any two groups
198
"""
199
200
def equal_opportunity_ratio(y_true, y_pred, *, sensitive_features,
201
method="between_groups", sample_weight=None):
202
"""
203
Calculate ratio of true positive rates between groups.
204
205
Parameters:
206
- y_true: array-like, true target values (binary)
207
- y_pred: array-like, predicted values (binary)
208
- sensitive_features: array-like, sensitive feature values
209
- method: str, comparison method ("between_groups" or "to_overall")
210
- sample_weight: array-like, optional sample weights
211
212
Returns:
213
float: Minimum ratio in TPR between any two groups
214
"""
215
```
216
217
### Base Metrics
218
219
Fundamental metric functions that can be used with MetricFrame or independently.
220
221
```python { .api }
222
def true_positive_rate(y_true, y_pred, *, sample_weight=None, pos_label=1):
223
"""
224
Calculate true positive rate (sensitivity/recall).
225
226
Parameters:
227
- y_true: array-like, true target values
228
- y_pred: array-like, predicted values
229
- sample_weight: array-like, optional sample weights
230
- pos_label: label considered as positive
231
232
Returns:
233
float: True positive rate
234
"""
235
236
def false_positive_rate(y_true, y_pred, *, sample_weight=None, pos_label=1):
237
"""Calculate false positive rate."""
238
239
def true_negative_rate(y_true, y_pred, *, sample_weight=None, pos_label=1):
240
"""Calculate true negative rate (specificity)."""
241
242
def false_negative_rate(y_true, y_pred, *, sample_weight=None, pos_label=1):
243
"""Calculate false negative rate."""
244
245
def selection_rate(y_true, y_pred, *, sample_weight=None, pos_label=1):
246
"""Calculate selection rate (positive prediction rate)."""
247
248
def mean_prediction(y_true, y_pred, *, sample_weight=None):
249
"""Calculate mean of predictions."""
250
251
def count(y_true, y_pred, *, sample_weight=None):
252
"""Count number of samples."""
253
```
254
255
### Derived Metrics
256
257
Create new fairness metrics from existing metric functions.
258
259
```python { .api }
260
def make_derived_metric(*, metric, transform, sample_weight_names=None):
261
"""
262
Create a derived metric with specified aggregation method.
263
264
Parameters:
265
- metric: callable, base metric function
266
- transform: str, aggregation method ('difference', 'ratio', 'group_min', 'group_max')
267
- sample_weight_names: list, parameter names for sample weights
268
269
Returns:
270
callable: New derived metric function
271
"""
272
```
273
274
### Visualization
275
276
Plotting functions for visualizing model comparison across fairness metrics.
277
278
```python { .api }
279
def plot_model_comparison(dashboard_predicted, *,
280
sensitive_features,
281
conf_intervals=False):
282
"""
283
Plot radar chart comparing multiple models across fairness and performance metrics.
284
285
Parameters:
286
- dashboard_predicted: dict, mapping of model names to prediction dictionaries
287
- sensitive_features: array-like, sensitive feature values
288
- conf_intervals: bool, whether to show confidence intervals
289
290
Returns:
291
matplotlib figure object
292
"""
293
```
294
295
## Generated Metrics
296
297
The metrics module dynamically generates additional fairness metrics for many base metrics using the pattern `<metric>_{difference,ratio,group_min,group_max}`. For example:
298
299
- `accuracy_score_difference`
300
- `precision_score_ratio`
301
- `recall_score_group_min`
302
- `f1_score_group_max`
303
304
These generated metrics provide convenient access to common fairness assessments without manually using `make_derived_metric`.