0
# Adversarial Training
1
2
Neural network-based approaches using adversarial training to learn fair representations while maintaining predictive utility. These methods use adversarial networks to remove sensitive information from learned representations.
3
4
## Capabilities
5
6
### AdversarialFairnessClassifier
7
8
Implements adversarial fairness for classification tasks using neural networks. Trains a predictor network alongside an adversary network that tries to predict sensitive attributes from the predictor's internal representations.
9
10
```python { .api }
11
class AdversarialFairnessClassifier:
12
def __init__(self, backend="torch", *, predictor_model=None, adversary_model=None,
13
alpha=1.0, epochs=1, batch_size=32, shuffle=True, progress_updates=None,
14
skip_validation=False, callbacks=None, random_state=None):
15
"""
16
Adversarial fairness classifier using neural networks.
17
18
Parameters:
19
- backend: str, neural network backend ("torch" or "tensorflow")
20
- predictor_model: neural network model for prediction task
21
- adversary_model: neural network model for adversary task
22
- alpha: float, strength of adversarial training (higher = more fairness emphasis)
23
- epochs: int, number of training epochs
24
- batch_size: int, batch size for training
25
- shuffle: bool, whether to shuffle training data
26
- progress_updates: callable, callback for training progress updates
27
- skip_validation: bool, whether to skip input validation
28
- callbacks: list, training callbacks
29
- random_state: int, random seed for reproducibility
30
"""
31
32
def fit(self, X, y, *, sensitive_features, sample_weight=None):
33
"""
34
Fit the adversarial fairness classifier.
35
36
Parameters:
37
- X: array-like, feature matrix
38
- y: array-like, target values
39
- sensitive_features: array-like, sensitive feature values
40
- sample_weight: array-like, optional sample weights
41
42
Returns:
43
self
44
"""
45
46
def predict(self, X):
47
"""
48
Make predictions using the trained fair classifier.
49
50
Parameters:
51
- X: array-like, feature matrix
52
53
Returns:
54
array-like: Predicted class labels
55
"""
56
57
def predict_proba(self, X):
58
"""
59
Predict class probabilities.
60
61
Parameters:
62
- X: array-like, feature matrix
63
64
Returns:
65
array-like: Predicted class probabilities, shape (n_samples, n_classes)
66
"""
67
```
68
69
#### Usage Example
70
71
```python
72
from fairlearn.adversarial import AdversarialFairnessClassifier
73
import numpy as np
74
75
# Create adversarial fairness classifier
76
afc = AdversarialFairnessClassifier(
77
backend="torch", # or "tensorflow"
78
alpha=1.0, # Fairness strength
79
epochs=50, # Training epochs
80
batch_size=64,
81
random_state=42
82
)
83
84
# Fit the model
85
afc.fit(X_train, y_train, sensitive_features=A_train)
86
87
# Make predictions
88
predictions = afc.predict(X_test)
89
probabilities = afc.predict_proba(X_test)
90
```
91
92
### AdversarialFairnessRegressor
93
94
Implements adversarial fairness for regression tasks, training a predictor to minimize prediction error while preventing an adversary from predicting sensitive attributes.
95
96
```python { .api }
97
class AdversarialFairnessRegressor:
98
def __init__(self, backend="torch", *, predictor_model=None, adversary_model=None,
99
alpha=1.0, epochs=1, batch_size=32, shuffle=True, progress_updates=None,
100
skip_validation=False, callbacks=None, random_state=None):
101
"""
102
Adversarial fairness regressor using neural networks.
103
104
Parameters:
105
- backend: str, neural network backend ("torch" or "tensorflow")
106
- predictor_model: neural network model for regression task
107
- adversary_model: neural network model for adversary task
108
- alpha: float, strength of adversarial training
109
- epochs: int, number of training epochs
110
- batch_size: int, batch size for training
111
- shuffle: bool, whether to shuffle training data
112
- progress_updates: callable, callback for training progress updates
113
- skip_validation: bool, whether to skip input validation
114
- callbacks: list, training callbacks
115
- random_state: int, random seed for reproducibility
116
"""
117
118
def fit(self, X, y, *, sensitive_features, sample_weight=None):
119
"""
120
Fit the adversarial fairness regressor.
121
122
Parameters:
123
- X: array-like, feature matrix
124
- y: array-like, continuous target values
125
- sensitive_features: array-like, sensitive feature values
126
- sample_weight: array-like, optional sample weights
127
128
Returns:
129
self
130
"""
131
132
def predict(self, X):
133
"""
134
Make regression predictions.
135
136
Parameters:
137
- X: array-like, feature matrix
138
139
Returns:
140
array-like: Predicted continuous values
141
"""
142
```
143
144
## Backend Support
145
146
### PyTorch Backend
147
148
The default backend uses PyTorch for neural network implementation:
149
150
```python
151
# Using PyTorch backend (default)
152
classifier = AdversarialFairnessClassifier(
153
backend="torch",
154
epochs=100,
155
batch_size=128
156
)
157
```
158
159
### TensorFlow Backend
160
161
Alternative backend using TensorFlow:
162
163
```python
164
# Using TensorFlow backend
165
classifier = AdversarialFairnessClassifier(
166
backend="tensorflow",
167
epochs=100,
168
batch_size=128
169
)
170
```
171
172
## Custom Neural Network Models
173
174
### Custom Predictor Models
175
176
You can provide custom neural network architectures:
177
178
```python
179
import torch
180
import torch.nn as nn
181
182
# Define custom predictor model
183
class CustomPredictor(nn.Module):
184
def __init__(self, input_dim, hidden_dim=64):
185
super().__init__()
186
self.layers = nn.Sequential(
187
nn.Linear(input_dim, hidden_dim),
188
nn.ReLU(),
189
nn.Dropout(0.2),
190
nn.Linear(hidden_dim, hidden_dim),
191
nn.ReLU(),
192
nn.Dropout(0.2),
193
nn.Linear(hidden_dim, 1),
194
nn.Sigmoid()
195
)
196
197
def forward(self, x):
198
return self.layers(x)
199
200
# Use custom model
201
predictor = CustomPredictor(input_dim=X_train.shape[1])
202
203
classifier = AdversarialFairnessClassifier(
204
backend="torch",
205
predictor_model=predictor,
206
alpha=2.0,
207
epochs=200
208
)
209
```
210
211
### Custom Adversary Models
212
213
Customize the adversary network architecture:
214
215
```python
216
class CustomAdversary(nn.Module):
217
def __init__(self, input_dim, n_sensitive_classes):
218
super().__init__()
219
self.layers = nn.Sequential(
220
nn.Linear(input_dim, 32),
221
nn.ReLU(),
222
nn.Linear(32, 16),
223
nn.ReLU(),
224
nn.Linear(16, n_sensitive_classes),
225
nn.Softmax(dim=1)
226
)
227
228
def forward(self, x):
229
return self.layers(x)
230
231
# Create adversary for binary sensitive attribute
232
adversary = CustomAdversary(
233
input_dim=64, # Should match predictor's representation size
234
n_sensitive_classes=2
235
)
236
237
classifier = AdversarialFairnessClassifier(
238
backend="torch",
239
predictor_model=predictor,
240
adversary_model=adversary,
241
alpha=1.5
242
)
243
```
244
245
## Training Configuration
246
247
### Hyperparameter Tuning
248
249
Key hyperparameters to tune for adversarial training:
250
251
```python
252
# Alpha controls fairness-accuracy trade-off
253
alphas = [0.1, 0.5, 1.0, 2.0, 5.0]
254
255
results = {}
256
for alpha in alphas:
257
classifier = AdversarialFairnessClassifier(
258
alpha=alpha,
259
epochs=100,
260
batch_size=64,
261
random_state=42
262
)
263
264
classifier.fit(X_train, y_train, sensitive_features=A_train)
265
predictions = classifier.predict(X_test)
266
267
# Evaluate fairness and accuracy
268
results[alpha] = evaluate_model(predictions, y_test, A_test)
269
```
270
271
### Training Callbacks
272
273
Monitor training progress with custom callbacks:
274
275
```python
276
def progress_callback(epoch, predictor_loss, adversary_loss, adversary_accuracy):
277
"""Callback to monitor training progress."""
278
if epoch % 10 == 0:
279
print(f"Epoch {epoch}: Predictor Loss={predictor_loss:.4f}, "
280
f"Adversary Loss={adversary_loss:.4f}, "
281
f"Adversary Acc={adversary_accuracy:.4f}")
282
283
classifier = AdversarialFairnessClassifier(
284
progress_updates=progress_callback,
285
epochs=200
286
)
287
```
288
289
## Advanced Usage
290
291
### Multi-class Sensitive Features
292
293
Handle sensitive attributes with multiple categories:
294
295
```python
296
# Sensitive feature with 3 categories
297
sensitive_features = ['group_A', 'group_B', 'group_C'] * (len(X_train) // 3)
298
299
classifier = AdversarialFairnessClassifier(
300
alpha=1.0,
301
epochs=150
302
)
303
304
classifier.fit(X_train, y_train, sensitive_features=sensitive_features)
305
```
306
307
### Batch Size Selection
308
309
Choose appropriate batch sizes based on dataset size:
310
311
```python
312
# For small datasets
313
small_classifier = AdversarialFairnessClassifier(batch_size=16)
314
315
# For large datasets
316
large_classifier = AdversarialFairnessClassifier(batch_size=256)
317
318
# Adaptive batch size based on data size
319
batch_size = min(128, len(X_train) // 10)
320
adaptive_classifier = AdversarialFairnessClassifier(batch_size=batch_size)
321
```
322
323
### Early Stopping
324
325
Implement custom early stopping:
326
327
```python
328
class EarlyStoppingCallback:
329
def __init__(self, patience=10, min_delta=0.001):
330
self.patience = patience
331
self.min_delta = min_delta
332
self.best_loss = float('inf')
333
self.wait = 0
334
335
def __call__(self, epoch, predictor_loss, adversary_loss, adversary_accuracy):
336
if predictor_loss < self.best_loss - self.min_delta:
337
self.best_loss = predictor_loss
338
self.wait = 0
339
else:
340
self.wait += 1
341
342
if self.wait >= self.patience:
343
print(f"Early stopping at epoch {epoch}")
344
return True # Signal to stop training
345
return False
346
347
early_stopping = EarlyStoppingCallback(patience=15)
348
349
classifier = AdversarialFairnessClassifier(
350
callbacks=[early_stopping],
351
epochs=1000 # Large number, early stopping will control actual epochs
352
)
353
```
354
355
## Integration with Evaluation
356
357
Combine with fairness assessment tools:
358
359
```python
360
from fairlearn.metrics import MetricFrame, equalized_odds_difference
361
362
# Train adversarial model
363
afc = AdversarialFairnessClassifier(alpha=1.0, epochs=100)
364
afc.fit(X_train, y_train, sensitive_features=A_train)
365
366
# Get predictions and evaluate
367
predictions = afc.predict(X_test)
368
probabilities = afc.predict_proba(X_test)
369
370
# Assess fairness
371
fairness_metrics = MetricFrame(
372
metrics={
373
'accuracy': lambda y_true, y_pred: (y_true == y_pred).mean(),
374
'selection_rate': lambda y_true, y_pred: y_pred.mean()
375
},
376
y_true=y_test,
377
y_pred=predictions,
378
sensitive_features=A_test
379
)
380
381
print("Adversarial fairness results:")
382
print(fairness_metrics.by_group)
383
print(f"Equalized odds difference: {equalized_odds_difference(y_test, predictions, sensitive_features=A_test)}")
384
```
385
386
## Best Practices
387
388
### Model Architecture
389
390
1. **Predictor Complexity**: Use sufficiently complex predictors for your task
391
2. **Adversary Simplicity**: Keep adversary simpler than predictor to avoid overfitting
392
3. **Representation Size**: Choose appropriate intermediate representation dimensions
393
394
### Training Strategy
395
396
1. **Learning Rates**: Use different learning rates for predictor and adversary
397
2. **Training Balance**: Monitor that neither predictor nor adversary dominates
398
3. **Convergence**: Look for stable oscillation rather than monotonic convergence
399
400
### Hyperparameter Guidelines
401
402
- **Alpha = 0.1-0.5**: Mild fairness emphasis, preserves most accuracy
403
- **Alpha = 1.0-2.0**: Balanced fairness-accuracy trade-off
404
- **Alpha = 5.0+**: Strong fairness emphasis, may sacrifice accuracy
405
406
```python
407
# Recommended starting configuration
408
classifier = AdversarialFairnessClassifier(
409
backend="torch",
410
alpha=1.0, # Balanced trade-off
411
epochs=100, # Sufficient for convergence
412
batch_size=64, # Good balance for most datasets
413
shuffle=True, # Important for training stability
414
random_state=42 # For reproducibility
415
)
416
```