or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

asset-management.mdautoml.mdclient-auth.mdcompute-management.mdhyperparameter-tuning.mdindex.mdjob-management.mdmodel-deployment.md

hyperparameter-tuning.mddocs/

0

# Hyperparameter Tuning

1

2

Advanced hyperparameter optimization capabilities with various search spaces, sampling algorithms, and early termination policies for efficient model optimization.

3

4

## Capabilities

5

6

### Sweep Jobs

7

8

Hyperparameter sweep jobs for optimizing model performance across parameter spaces.

9

10

```python { .api }

11

class SweepJob:

12

def __init__(

13

self,

14

*,

15

trial: CommandJob,

16

search_space: dict,

17

objective: Objective,

18

sampling_algorithm: SamplingAlgorithm = None,

19

early_termination: EarlyTerminationPolicy = None,

20

limits: SweepJobLimits = None,

21

compute: str = None,

22

**kwargs

23

):

24

"""

25

Hyperparameter sweep job for model optimization.

26

27

Parameters:

28

- trial: Template command job defining the training script

29

- search_space: Dictionary defining parameter search spaces

30

- objective: Optimization objective and metric

31

- sampling_algorithm: Parameter sampling strategy

32

- early_termination: Early stopping policy

33

- limits: Sweep execution limits

34

- compute: Compute target for sweep trials

35

"""

36

37

class SweepJobLimits:

38

def __init__(

39

self,

40

*,

41

max_total_trials: int = 1,

42

max_concurrent_trials: int = 1,

43

timeout_minutes: int = None,

44

trial_timeout_minutes: int = None

45

):

46

"""

47

Limits for sweep job execution.

48

49

Parameters:

50

- max_total_trials: Maximum number of trials to run

51

- max_concurrent_trials: Maximum concurrent trials

52

- timeout_minutes: Total sweep timeout in minutes

53

- trial_timeout_minutes: Individual trial timeout in minutes

54

"""

55

```

56

57

#### Usage Example

58

59

```python

60

from azure.ai.ml import command

61

from azure.ai.ml.entities import SweepJob, SweepJobLimits

62

from azure.ai.ml.sweep import Choice, Uniform, Objective, RandomSamplingAlgorithm, BanditPolicy

63

64

# Define the training command template

65

command_job = command(

66

code="./src",

67

command="python train.py --learning_rate ${{search_space.learning_rate}} --batch_size ${{search_space.batch_size}}",

68

environment="AzureML-sklearn-1.0-ubuntu20.04-py38-cpu:1",

69

compute="cpu-cluster"

70

)

71

72

# Define search space

73

search_space = {

74

"learning_rate": Uniform(min_value=0.001, max_value=0.1),

75

"batch_size": Choice(values=[16, 32, 64, 128])

76

}

77

78

# Create sweep job

79

sweep_job = SweepJob(

80

trial=command_job,

81

search_space=search_space,

82

objective=Objective(goal="maximize", primary_metric="accuracy"),

83

sampling_algorithm=RandomSamplingAlgorithm(),

84

early_termination=BanditPolicy(slack_factor=0.1, evaluation_interval=2),

85

limits=SweepJobLimits(

86

max_total_trials=20,

87

max_concurrent_trials=4,

88

timeout_minutes=120

89

)

90

)

91

92

# Submit sweep job

93

submitted_sweep = ml_client.jobs.create_or_update(sweep_job)

94

```

95

96

### Search Space Functions

97

98

Functions for defining parameter search spaces with different distributions.

99

100

```python { .api }

101

class Choice:

102

def __init__(self, values: list):

103

"""

104

Discrete choice from a list of values.

105

106

Parameters:

107

- values: List of possible values to choose from

108

"""

109

110

class Uniform:

111

def __init__(self, min_value: float, max_value: float):

112

"""

113

Uniform distribution between min and max values.

114

115

Parameters:

116

- min_value: Minimum value

117

- max_value: Maximum value

118

"""

119

120

class LogUniform:

121

def __init__(self, min_value: float, max_value: float):

122

"""

123

Log-uniform distribution for parameters that vary exponentially.

124

125

Parameters:

126

- min_value: Minimum value (must be > 0)

127

- max_value: Maximum value

128

"""

129

130

class Normal:

131

def __init__(self, mu: float, sigma: float):

132

"""

133

Normal (Gaussian) distribution.

134

135

Parameters:

136

- mu: Mean of the distribution

137

- sigma: Standard deviation

138

"""

139

140

class LogNormal:

141

def __init__(self, mu: float, sigma: float):

142

"""

143

Log-normal distribution for positive parameters.

144

145

Parameters:

146

- mu: Mean of the underlying normal distribution

147

- sigma: Standard deviation of the underlying normal distribution

148

"""

149

150

class QUniform:

151

def __init__(self, min_value: float, max_value: float, q: float):

152

"""

153

Quantized uniform distribution.

154

155

Parameters:

156

- min_value: Minimum value

157

- max_value: Maximum value

158

- q: Quantization step size

159

"""

160

161

class QLogUniform:

162

def __init__(self, min_value: float, max_value: float, q: float):

163

"""

164

Quantized log-uniform distribution.

165

166

Parameters:

167

- min_value: Minimum value (must be > 0)

168

- max_value: Maximum value

169

- q: Quantization step size

170

"""

171

172

class QNormal:

173

def __init__(self, mu: float, sigma: float, q: float):

174

"""

175

Quantized normal distribution.

176

177

Parameters:

178

- mu: Mean of the distribution

179

- sigma: Standard deviation

180

- q: Quantization step size

181

"""

182

183

class QLogNormal:

184

def __init__(self, mu: float, sigma: float, q: float):

185

"""

186

Quantized log-normal distribution.

187

188

Parameters:

189

- mu: Mean of the underlying normal distribution

190

- sigma: Standard deviation of the underlying normal distribution

191

- q: Quantization step size

192

"""

193

194

class Randint:

195

def __init__(self, upper: int):

196

"""

197

Random integer from 0 to upper-1.

198

199

Parameters:

200

- upper: Upper bound (exclusive)

201

"""

202

```

203

204

#### Usage Example

205

206

```python

207

from azure.ai.ml.sweep import Choice, Uniform, LogUniform, Normal, Randint

208

209

# Different search space examples

210

search_space = {

211

# Discrete choices

212

"optimizer": Choice(values=["adam", "sgd", "rmsprop"]),

213

"activation": Choice(values=["relu", "tanh", "sigmoid"]),

214

215

# Continuous ranges

216

"learning_rate": LogUniform(min_value=1e-5, max_value=1e-1),

217

"dropout_rate": Uniform(min_value=0.1, max_value=0.5),

218

"weight_decay": LogUniform(min_value=1e-6, max_value=1e-2),

219

220

# Normal distributions

221

"hidden_size": Normal(mu=128, sigma=32),

222

223

# Integer ranges

224

"batch_size": Choice(values=[16, 32, 64, 128, 256]),

225

"num_layers": Randint(upper=5) # 0, 1, 2, 3, or 4

226

}

227

```

228

229

### Sampling Algorithms

230

231

Different strategies for sampling parameters from the search space.

232

233

```python { .api }

234

class SamplingAlgorithm:

235

"""Base class for sampling algorithms."""

236

237

class RandomSamplingAlgorithm(SamplingAlgorithm):

238

def __init__(self, seed: int = None):

239

"""

240

Random sampling from the search space.

241

242

Parameters:

243

- seed: Random seed for reproducibility

244

"""

245

246

class GridSamplingAlgorithm(SamplingAlgorithm):

247

def __init__(self):

248

"""

249

Grid search over all parameter combinations.

250

Note: Only works with Choice parameters.

251

"""

252

253

class BayesianSamplingAlgorithm(SamplingAlgorithm):

254

def __init__(self):

255

"""

256

Bayesian optimization for intelligent parameter selection.

257

Uses previous trial results to guide future parameter choices.

258

"""

259

```

260

261

#### Usage Example

262

263

```python

264

from azure.ai.ml.sweep import RandomSamplingAlgorithm, BayesianSamplingAlgorithm, GridSamplingAlgorithm

265

266

# Random sampling (most common)

267

random_sampling = RandomSamplingAlgorithm(seed=42)

268

269

# Bayesian optimization (for expensive evaluations)

270

bayesian_sampling = BayesianSamplingAlgorithm()

271

272

# Grid search (for small, discrete spaces)

273

grid_sampling = GridSamplingAlgorithm()

274

```

275

276

### Early Termination Policies

277

278

Policies for early stopping of underperforming trials to save computational resources.

279

280

```python { .api }

281

class BanditPolicy:

282

def __init__(

283

self,

284

*,

285

slack_factor: float = None,

286

slack_amount: float = None,

287

evaluation_interval: int = 1,

288

delay_evaluation: int = 0

289

):

290

"""

291

Bandit early termination policy based on slack criteria.

292

293

Parameters:

294

- slack_factor: Slack factor as a ratio (e.g., 0.1 = 10% slack)

295

- slack_amount: Slack amount as absolute value

296

- evaluation_interval: Frequency of policy evaluation

297

- delay_evaluation: Number of intervals to delay evaluation

298

"""

299

300

class MedianStoppingPolicy:

301

def __init__(

302

self,

303

*,

304

evaluation_interval: int = 1,

305

delay_evaluation: int = 0

306

):

307

"""

308

Median stopping policy terminates trials performing worse than median.

309

310

Parameters:

311

- evaluation_interval: Frequency of policy evaluation

312

- delay_evaluation: Number of intervals to delay evaluation

313

"""

314

315

class TruncationSelectionPolicy:

316

def __init__(

317

self,

318

*,

319

truncation_percentage: int = 10,

320

evaluation_interval: int = 1,

321

delay_evaluation: int = 0,

322

exclude_finished_jobs: bool = False

323

):

324

"""

325

Truncation policy terminates a percentage of worst performing trials.

326

327

Parameters:

328

- truncation_percentage: Percentage of trials to terminate

329

- evaluation_interval: Frequency of policy evaluation

330

- delay_evaluation: Number of intervals to delay evaluation

331

- exclude_finished_jobs: Whether to exclude finished jobs from evaluation

332

"""

333

```

334

335

#### Usage Example

336

337

```python

338

from azure.ai.ml.sweep import BanditPolicy, MedianStoppingPolicy, TruncationSelectionPolicy

339

340

# Conservative bandit policy (10% slack)

341

bandit_policy = BanditPolicy(

342

slack_factor=0.1,

343

evaluation_interval=2,

344

delay_evaluation=5

345

)

346

347

# Median stopping policy

348

median_policy = MedianStoppingPolicy(

349

evaluation_interval=1,

350

delay_evaluation=10

351

)

352

353

# Aggressive truncation policy (terminate bottom 20%)

354

truncation_policy = TruncationSelectionPolicy(

355

truncation_percentage=20,

356

evaluation_interval=1,

357

delay_evaluation=5

358

)

359

```

360

361

### Optimization Objectives

362

363

Definition of optimization goals and metrics for hyperparameter tuning.

364

365

```python { .api }

366

class Objective:

367

def __init__(

368

self,

369

*,

370

goal: str,

371

primary_metric: str

372

):

373

"""

374

Optimization objective for hyperparameter tuning.

375

376

Parameters:

377

- goal: Optimization goal ("maximize" or "minimize")

378

- primary_metric: Name of the metric to optimize

379

"""

380

```

381

382

#### Usage Example

383

384

```python

385

from azure.ai.ml.sweep import Objective

386

387

# Maximize accuracy

388

accuracy_objective = Objective(

389

goal="maximize",

390

primary_metric="accuracy"

391

)

392

393

# Minimize loss

394

loss_objective = Objective(

395

goal="minimize",

396

primary_metric="loss"

397

)

398

399

# Maximize F1 score

400

f1_objective = Objective(

401

goal="maximize",

402

primary_metric="f1_score"

403

)

404

```

405

406

### Complete Sweep Example

407

408

```python

409

from azure.ai.ml import command

410

from azure.ai.ml.entities import SweepJob, SweepJobLimits, Environment

411

from azure.ai.ml.sweep import (

412

Choice, Uniform, LogUniform,

413

RandomSamplingAlgorithm, BayesianSamplingAlgorithm,

414

BanditPolicy, Objective

415

)

416

417

# Define training command template

418

training_job = command(

419

code="./src",

420

command="python train.py --lr ${{search_space.learning_rate}} --batch_size ${{search_space.batch_size}} --optimizer ${{search_space.optimizer}}",

421

environment=Environment(

422

image="mcr.microsoft.com/azureml/sklearn-1.0-ubuntu20.04-py38-cpu-inference:latest"

423

),

424

compute="cpu-cluster",

425

outputs={

426

"model": {"type": "uri_folder", "path": "azureml://datastores/workspaceblobstore/paths/models/"}

427

}

428

)

429

430

# Define comprehensive search space

431

search_space = {

432

"learning_rate": LogUniform(min_value=1e-4, max_value=1e-1),

433

"batch_size": Choice(values=[32, 64, 128, 256]),

434

"optimizer": Choice(values=["adam", "sgd", "adamw"]),

435

"weight_decay": LogUniform(min_value=1e-6, max_value=1e-2),

436

"num_epochs": Choice(values=[10, 20, 30, 50])

437

}

438

439

# Create sweep job with Bayesian optimization

440

sweep_job = SweepJob(

441

trial=training_job,

442

search_space=search_space,

443

objective=Objective(goal="maximize", primary_metric="val_accuracy"),

444

sampling_algorithm=BayesianSamplingAlgorithm(),

445

early_termination=BanditPolicy(

446

slack_factor=0.15,

447

evaluation_interval=2,

448

delay_evaluation=10

449

),

450

limits=SweepJobLimits(

451

max_total_trials=50,

452

max_concurrent_trials=5,

453

timeout_minutes=300,

454

trial_timeout_minutes=30

455

),

456

experiment_name="hyperparameter-sweep"

457

)

458

459

# Submit and monitor sweep

460

submitted_sweep = ml_client.jobs.create_or_update(sweep_job)

461

print(f"Sweep job submitted: {submitted_sweep.name}")

462

463

# Monitor sweep progress

464

print(f"Sweep job URL: {submitted_sweep.studio_url}")

465

```

466

467

## Best Practices

468

469

### Search Space Design

470

- Use log scales for learning rates and regularization parameters

471

- Start with broad ranges and narrow down based on results

472

- Use Choice for categorical parameters and discrete values

473

- Consider parameter interactions when designing spaces

474

475

### Sampling Strategy Selection

476

- **Random sampling**: Good default choice, works well with early termination

477

- **Bayesian optimization**: Better for expensive evaluations, fewer trials needed

478

- **Grid search**: Only for small discrete spaces with few parameters

479

480

### Early Termination Guidelines

481

- **BanditPolicy**: Most flexible, good for most scenarios

482

- **MedianStoppingPolicy**: Conservative, good for stable metrics

483

- **TruncationSelectionPolicy**: Aggressive, good when resources are limited

484

485

### Resource Management

486

- Set appropriate `max_concurrent_trials` based on compute availability

487

- Use `trial_timeout_minutes` to prevent stuck trials

488

- Consider total cost when setting `max_total_trials`