0
# Hyperparameter Tuning
1
2
Advanced hyperparameter optimization capabilities with various search spaces, sampling algorithms, and early termination policies for efficient model optimization.
3
4
## Capabilities
5
6
### Sweep Jobs
7
8
Hyperparameter sweep jobs for optimizing model performance across parameter spaces.
9
10
```python { .api }
11
class SweepJob:
12
def __init__(
13
self,
14
*,
15
trial: CommandJob,
16
search_space: dict,
17
objective: Objective,
18
sampling_algorithm: SamplingAlgorithm = None,
19
early_termination: EarlyTerminationPolicy = None,
20
limits: SweepJobLimits = None,
21
compute: str = None,
22
**kwargs
23
):
24
"""
25
Hyperparameter sweep job for model optimization.
26
27
Parameters:
28
- trial: Template command job defining the training script
29
- search_space: Dictionary defining parameter search spaces
30
- objective: Optimization objective and metric
31
- sampling_algorithm: Parameter sampling strategy
32
- early_termination: Early stopping policy
33
- limits: Sweep execution limits
34
- compute: Compute target for sweep trials
35
"""
36
37
class SweepJobLimits:
38
def __init__(
39
self,
40
*,
41
max_total_trials: int = 1,
42
max_concurrent_trials: int = 1,
43
timeout_minutes: int = None,
44
trial_timeout_minutes: int = None
45
):
46
"""
47
Limits for sweep job execution.
48
49
Parameters:
50
- max_total_trials: Maximum number of trials to run
51
- max_concurrent_trials: Maximum concurrent trials
52
- timeout_minutes: Total sweep timeout in minutes
53
- trial_timeout_minutes: Individual trial timeout in minutes
54
"""
55
```
56
57
#### Usage Example
58
59
```python
60
from azure.ai.ml import command
61
from azure.ai.ml.entities import SweepJob, SweepJobLimits
62
from azure.ai.ml.sweep import Choice, Uniform, Objective, RandomSamplingAlgorithm, BanditPolicy
63
64
# Define the training command template
65
command_job = command(
66
code="./src",
67
command="python train.py --learning_rate ${{search_space.learning_rate}} --batch_size ${{search_space.batch_size}}",
68
environment="AzureML-sklearn-1.0-ubuntu20.04-py38-cpu:1",
69
compute="cpu-cluster"
70
)
71
72
# Define search space
73
search_space = {
74
"learning_rate": Uniform(min_value=0.001, max_value=0.1),
75
"batch_size": Choice(values=[16, 32, 64, 128])
76
}
77
78
# Create sweep job
79
sweep_job = SweepJob(
80
trial=command_job,
81
search_space=search_space,
82
objective=Objective(goal="maximize", primary_metric="accuracy"),
83
sampling_algorithm=RandomSamplingAlgorithm(),
84
early_termination=BanditPolicy(slack_factor=0.1, evaluation_interval=2),
85
limits=SweepJobLimits(
86
max_total_trials=20,
87
max_concurrent_trials=4,
88
timeout_minutes=120
89
)
90
)
91
92
# Submit sweep job
93
submitted_sweep = ml_client.jobs.create_or_update(sweep_job)
94
```
95
96
### Search Space Functions
97
98
Functions for defining parameter search spaces with different distributions.
99
100
```python { .api }
101
class Choice:
102
def __init__(self, values: list):
103
"""
104
Discrete choice from a list of values.
105
106
Parameters:
107
- values: List of possible values to choose from
108
"""
109
110
class Uniform:
111
def __init__(self, min_value: float, max_value: float):
112
"""
113
Uniform distribution between min and max values.
114
115
Parameters:
116
- min_value: Minimum value
117
- max_value: Maximum value
118
"""
119
120
class LogUniform:
121
def __init__(self, min_value: float, max_value: float):
122
"""
123
Log-uniform distribution for parameters that vary exponentially.
124
125
Parameters:
126
- min_value: Minimum value (must be > 0)
127
- max_value: Maximum value
128
"""
129
130
class Normal:
131
def __init__(self, mu: float, sigma: float):
132
"""
133
Normal (Gaussian) distribution.
134
135
Parameters:
136
- mu: Mean of the distribution
137
- sigma: Standard deviation
138
"""
139
140
class LogNormal:
141
def __init__(self, mu: float, sigma: float):
142
"""
143
Log-normal distribution for positive parameters.
144
145
Parameters:
146
- mu: Mean of the underlying normal distribution
147
- sigma: Standard deviation of the underlying normal distribution
148
"""
149
150
class QUniform:
151
def __init__(self, min_value: float, max_value: float, q: float):
152
"""
153
Quantized uniform distribution.
154
155
Parameters:
156
- min_value: Minimum value
157
- max_value: Maximum value
158
- q: Quantization step size
159
"""
160
161
class QLogUniform:
162
def __init__(self, min_value: float, max_value: float, q: float):
163
"""
164
Quantized log-uniform distribution.
165
166
Parameters:
167
- min_value: Minimum value (must be > 0)
168
- max_value: Maximum value
169
- q: Quantization step size
170
"""
171
172
class QNormal:
173
def __init__(self, mu: float, sigma: float, q: float):
174
"""
175
Quantized normal distribution.
176
177
Parameters:
178
- mu: Mean of the distribution
179
- sigma: Standard deviation
180
- q: Quantization step size
181
"""
182
183
class QLogNormal:
184
def __init__(self, mu: float, sigma: float, q: float):
185
"""
186
Quantized log-normal distribution.
187
188
Parameters:
189
- mu: Mean of the underlying normal distribution
190
- sigma: Standard deviation of the underlying normal distribution
191
- q: Quantization step size
192
"""
193
194
class Randint:
195
def __init__(self, upper: int):
196
"""
197
Random integer from 0 to upper-1.
198
199
Parameters:
200
- upper: Upper bound (exclusive)
201
"""
202
```
203
204
#### Usage Example
205
206
```python
207
from azure.ai.ml.sweep import Choice, Uniform, LogUniform, Normal, Randint
208
209
# Different search space examples
210
search_space = {
211
# Discrete choices
212
"optimizer": Choice(values=["adam", "sgd", "rmsprop"]),
213
"activation": Choice(values=["relu", "tanh", "sigmoid"]),
214
215
# Continuous ranges
216
"learning_rate": LogUniform(min_value=1e-5, max_value=1e-1),
217
"dropout_rate": Uniform(min_value=0.1, max_value=0.5),
218
"weight_decay": LogUniform(min_value=1e-6, max_value=1e-2),
219
220
# Normal distributions
221
"hidden_size": Normal(mu=128, sigma=32),
222
223
# Integer ranges
224
"batch_size": Choice(values=[16, 32, 64, 128, 256]),
225
"num_layers": Randint(upper=5) # 0, 1, 2, 3, or 4
226
}
227
```
228
229
### Sampling Algorithms
230
231
Different strategies for sampling parameters from the search space.
232
233
```python { .api }
234
class SamplingAlgorithm:
235
"""Base class for sampling algorithms."""
236
237
class RandomSamplingAlgorithm(SamplingAlgorithm):
238
def __init__(self, seed: int = None):
239
"""
240
Random sampling from the search space.
241
242
Parameters:
243
- seed: Random seed for reproducibility
244
"""
245
246
class GridSamplingAlgorithm(SamplingAlgorithm):
247
def __init__(self):
248
"""
249
Grid search over all parameter combinations.
250
Note: Only works with Choice parameters.
251
"""
252
253
class BayesianSamplingAlgorithm(SamplingAlgorithm):
254
def __init__(self):
255
"""
256
Bayesian optimization for intelligent parameter selection.
257
Uses previous trial results to guide future parameter choices.
258
"""
259
```
260
261
#### Usage Example
262
263
```python
264
from azure.ai.ml.sweep import RandomSamplingAlgorithm, BayesianSamplingAlgorithm, GridSamplingAlgorithm
265
266
# Random sampling (most common)
267
random_sampling = RandomSamplingAlgorithm(seed=42)
268
269
# Bayesian optimization (for expensive evaluations)
270
bayesian_sampling = BayesianSamplingAlgorithm()
271
272
# Grid search (for small, discrete spaces)
273
grid_sampling = GridSamplingAlgorithm()
274
```
275
276
### Early Termination Policies
277
278
Policies for early stopping of underperforming trials to save computational resources.
279
280
```python { .api }
281
class BanditPolicy:
282
def __init__(
283
self,
284
*,
285
slack_factor: float = None,
286
slack_amount: float = None,
287
evaluation_interval: int = 1,
288
delay_evaluation: int = 0
289
):
290
"""
291
Bandit early termination policy based on slack criteria.
292
293
Parameters:
294
- slack_factor: Slack factor as a ratio (e.g., 0.1 = 10% slack)
295
- slack_amount: Slack amount as absolute value
296
- evaluation_interval: Frequency of policy evaluation
297
- delay_evaluation: Number of intervals to delay evaluation
298
"""
299
300
class MedianStoppingPolicy:
301
def __init__(
302
self,
303
*,
304
evaluation_interval: int = 1,
305
delay_evaluation: int = 0
306
):
307
"""
308
Median stopping policy terminates trials performing worse than median.
309
310
Parameters:
311
- evaluation_interval: Frequency of policy evaluation
312
- delay_evaluation: Number of intervals to delay evaluation
313
"""
314
315
class TruncationSelectionPolicy:
316
def __init__(
317
self,
318
*,
319
truncation_percentage: int = 10,
320
evaluation_interval: int = 1,
321
delay_evaluation: int = 0,
322
exclude_finished_jobs: bool = False
323
):
324
"""
325
Truncation policy terminates a percentage of worst performing trials.
326
327
Parameters:
328
- truncation_percentage: Percentage of trials to terminate
329
- evaluation_interval: Frequency of policy evaluation
330
- delay_evaluation: Number of intervals to delay evaluation
331
- exclude_finished_jobs: Whether to exclude finished jobs from evaluation
332
"""
333
```
334
335
#### Usage Example
336
337
```python
338
from azure.ai.ml.sweep import BanditPolicy, MedianStoppingPolicy, TruncationSelectionPolicy
339
340
# Conservative bandit policy (10% slack)
341
bandit_policy = BanditPolicy(
342
slack_factor=0.1,
343
evaluation_interval=2,
344
delay_evaluation=5
345
)
346
347
# Median stopping policy
348
median_policy = MedianStoppingPolicy(
349
evaluation_interval=1,
350
delay_evaluation=10
351
)
352
353
# Aggressive truncation policy (terminate bottom 20%)
354
truncation_policy = TruncationSelectionPolicy(
355
truncation_percentage=20,
356
evaluation_interval=1,
357
delay_evaluation=5
358
)
359
```
360
361
### Optimization Objectives
362
363
Definition of optimization goals and metrics for hyperparameter tuning.
364
365
```python { .api }
366
class Objective:
367
def __init__(
368
self,
369
*,
370
goal: str,
371
primary_metric: str
372
):
373
"""
374
Optimization objective for hyperparameter tuning.
375
376
Parameters:
377
- goal: Optimization goal ("maximize" or "minimize")
378
- primary_metric: Name of the metric to optimize
379
"""
380
```
381
382
#### Usage Example
383
384
```python
385
from azure.ai.ml.sweep import Objective
386
387
# Maximize accuracy
388
accuracy_objective = Objective(
389
goal="maximize",
390
primary_metric="accuracy"
391
)
392
393
# Minimize loss
394
loss_objective = Objective(
395
goal="minimize",
396
primary_metric="loss"
397
)
398
399
# Maximize F1 score
400
f1_objective = Objective(
401
goal="maximize",
402
primary_metric="f1_score"
403
)
404
```
405
406
### Complete Sweep Example
407
408
```python
409
from azure.ai.ml import command
410
from azure.ai.ml.entities import SweepJob, SweepJobLimits, Environment
411
from azure.ai.ml.sweep import (
412
Choice, Uniform, LogUniform,
413
RandomSamplingAlgorithm, BayesianSamplingAlgorithm,
414
BanditPolicy, Objective
415
)
416
417
# Define training command template
418
training_job = command(
419
code="./src",
420
command="python train.py --lr ${{search_space.learning_rate}} --batch_size ${{search_space.batch_size}} --optimizer ${{search_space.optimizer}}",
421
environment=Environment(
422
image="mcr.microsoft.com/azureml/sklearn-1.0-ubuntu20.04-py38-cpu-inference:latest"
423
),
424
compute="cpu-cluster",
425
outputs={
426
"model": {"type": "uri_folder", "path": "azureml://datastores/workspaceblobstore/paths/models/"}
427
}
428
)
429
430
# Define comprehensive search space
431
search_space = {
432
"learning_rate": LogUniform(min_value=1e-4, max_value=1e-1),
433
"batch_size": Choice(values=[32, 64, 128, 256]),
434
"optimizer": Choice(values=["adam", "sgd", "adamw"]),
435
"weight_decay": LogUniform(min_value=1e-6, max_value=1e-2),
436
"num_epochs": Choice(values=[10, 20, 30, 50])
437
}
438
439
# Create sweep job with Bayesian optimization
440
sweep_job = SweepJob(
441
trial=training_job,
442
search_space=search_space,
443
objective=Objective(goal="maximize", primary_metric="val_accuracy"),
444
sampling_algorithm=BayesianSamplingAlgorithm(),
445
early_termination=BanditPolicy(
446
slack_factor=0.15,
447
evaluation_interval=2,
448
delay_evaluation=10
449
),
450
limits=SweepJobLimits(
451
max_total_trials=50,
452
max_concurrent_trials=5,
453
timeout_minutes=300,
454
trial_timeout_minutes=30
455
),
456
experiment_name="hyperparameter-sweep"
457
)
458
459
# Submit and monitor sweep
460
submitted_sweep = ml_client.jobs.create_or_update(sweep_job)
461
print(f"Sweep job submitted: {submitted_sweep.name}")
462
463
# Monitor sweep progress
464
print(f"Sweep job URL: {submitted_sweep.studio_url}")
465
```
466
467
## Best Practices
468
469
### Search Space Design
470
- Use log scales for learning rates and regularization parameters
471
- Start with broad ranges and narrow down based on results
472
- Use Choice for categorical parameters and discrete values
473
- Consider parameter interactions when designing spaces
474
475
### Sampling Strategy Selection
476
- **Random sampling**: Good default choice, works well with early termination
477
- **Bayesian optimization**: Better for expensive evaluations, fewer trials needed
478
- **Grid search**: Only for small discrete spaces with few parameters
479
480
### Early Termination Guidelines
481
- **BanditPolicy**: Most flexible, good for most scenarios
482
- **MedianStoppingPolicy**: Conservative, good for stable metrics
483
- **TruncationSelectionPolicy**: Aggressive, good when resources are limited
484
485
### Resource Management
486
- Set appropriate `max_concurrent_trials` based on compute availability
487
- Use `trial_timeout_minutes` to prevent stuck trials
488
- Consider total cost when setting `max_total_trials`