0
# AutoML
1
2
Automated machine learning capabilities for tabular data (classification, regression, forecasting), computer vision tasks (image classification, object detection, instance segmentation), and natural language processing (text classification, named entity recognition).
3
4
## Capabilities
5
6
### Tabular AutoML
7
8
Automated ML for structured data with classification, regression, and forecasting tasks.
9
10
```python { .api }
11
def classification(
12
*,
13
target_column_name: str,
14
training_data: Data,
15
validation_data: Data = None,
16
test_data: Data = None,
17
primary_metric: str = "accuracy",
18
featurization: TabularFeaturizationSettings = None,
19
limits: TabularLimitSettings = None,
20
training_settings: TrainingSettings = None,
21
**kwargs
22
) -> ClassificationJob:
23
"""
24
Create an automated classification job for tabular data.
25
26
Parameters:
27
- target_column_name: Name of the target column
28
- training_data: Training dataset
29
- validation_data: Validation dataset (optional)
30
- test_data: Test dataset (optional)
31
- primary_metric: Primary metric for optimization
32
- featurization: Feature engineering settings
33
- limits: Training limits and constraints
34
- training_settings: Training configuration
35
36
Returns:
37
ClassificationJob configured for automated ML
38
"""
39
40
def regression(
41
*,
42
target_column_name: str,
43
training_data: Data,
44
validation_data: Data = None,
45
test_data: Data = None,
46
primary_metric: str = "normalized_root_mean_squared_error",
47
featurization: TabularFeaturizationSettings = None,
48
limits: TabularLimitSettings = None,
49
training_settings: TrainingSettings = None,
50
**kwargs
51
) -> RegressionJob:
52
"""
53
Create an automated regression job for tabular data.
54
55
Parameters:
56
- target_column_name: Name of the target column
57
- training_data: Training dataset
58
- validation_data: Validation dataset (optional)
59
- test_data: Test dataset (optional)
60
- primary_metric: Primary metric for optimization
61
- featurization: Feature engineering settings
62
- limits: Training limits and constraints
63
- training_settings: Training configuration
64
65
Returns:
66
RegressionJob configured for automated ML
67
"""
68
69
def forecasting(
70
*,
71
target_column_name: str,
72
training_data: Data,
73
validation_data: Data = None,
74
test_data: Data = None,
75
primary_metric: str = "normalized_root_mean_squared_error",
76
forecasting_settings: ForecastingSettings,
77
featurization: TabularFeaturizationSettings = None,
78
limits: TabularLimitSettings = None,
79
training_settings: TrainingSettings = None,
80
**kwargs
81
) -> ForecastingJob:
82
"""
83
Create an automated forecasting job for time series data.
84
85
Parameters:
86
- target_column_name: Name of the target column
87
- training_data: Training dataset
88
- validation_data: Validation dataset (optional)
89
- test_data: Test dataset (optional)
90
- primary_metric: Primary metric for optimization
91
- forecasting_settings: Time series specific settings
92
- featurization: Feature engineering settings
93
- limits: Training limits and constraints
94
- training_settings: Training configuration
95
96
Returns:
97
ForecastingJob configured for automated ML
98
"""
99
```
100
101
#### Usage Example
102
103
```python
104
from azure.ai.ml import automl
105
from azure.ai.ml.entities import Data
106
107
# Create training data asset
108
training_data = Data(
109
name="classification-data",
110
path="./data/train.csv",
111
type="mltable"
112
)
113
114
# Create classification job
115
classification_job = automl.classification(
116
target_column_name="target",
117
training_data=training_data,
118
primary_metric="accuracy",
119
compute="cpu-cluster",
120
experiment_name="automl-classification"
121
)
122
123
# Submit the job
124
submitted_job = ml_client.jobs.create_or_update(classification_job)
125
```
126
127
### Computer Vision AutoML
128
129
Automated ML for image-based tasks including classification, object detection, and instance segmentation.
130
131
```python { .api }
132
def image_classification(
133
*,
134
target_column_name: str,
135
training_data: Data,
136
validation_data: Data = None,
137
primary_metric: str = "accuracy",
138
limits: ImageLimitSettings = None,
139
sweep_settings: ImageSweepSettings = None,
140
model_settings: ImageModelSettingsClassification = None,
141
**kwargs
142
) -> ImageClassificationJob:
143
"""
144
Create an automated image classification job.
145
146
Parameters:
147
- target_column_name: Name of the target column
148
- training_data: Training dataset with images and labels
149
- validation_data: Validation dataset (optional)
150
- primary_metric: Primary metric for optimization
151
- limits: Training limits and constraints
152
- sweep_settings: Hyperparameter sweep settings
153
- model_settings: Model-specific settings
154
155
Returns:
156
ImageClassificationJob configured for automated ML
157
"""
158
159
def image_classification_multilabel(
160
*,
161
target_column_name: str,
162
training_data: Data,
163
validation_data: Data = None,
164
primary_metric: str = "iou",
165
**kwargs
166
) -> ImageClassificationMultilabelJob:
167
"""
168
Create an automated multi-label image classification job.
169
170
Parameters:
171
- target_column_name: Name of the target column
172
- training_data: Training dataset
173
- validation_data: Validation dataset (optional)
174
- primary_metric: Primary metric for optimization
175
176
Returns:
177
ImageClassificationMultilabelJob configured for automated ML
178
"""
179
180
def image_object_detection(
181
*,
182
target_column_name: str,
183
training_data: Data,
184
validation_data: Data = None,
185
primary_metric: str = "mean_average_precision",
186
limits: ImageLimitSettings = None,
187
sweep_settings: ImageSweepSettings = None,
188
model_settings: ImageModelSettingsObjectDetection = None,
189
**kwargs
190
) -> ImageObjectDetectionJob:
191
"""
192
Create an automated object detection job.
193
194
Parameters:
195
- target_column_name: Name of the target column
196
- training_data: Training dataset with images and bounding boxes
197
- validation_data: Validation dataset (optional)
198
- primary_metric: Primary metric for optimization
199
- limits: Training limits and constraints
200
- sweep_settings: Hyperparameter sweep settings
201
- model_settings: Model-specific settings
202
203
Returns:
204
ImageObjectDetectionJob configured for automated ML
205
"""
206
207
def image_instance_segmentation(
208
*,
209
target_column_name: str,
210
training_data: Data,
211
validation_data: Data = None,
212
primary_metric: str = "mean_average_precision",
213
**kwargs
214
) -> ImageInstanceSegmentationJob:
215
"""
216
Create an automated instance segmentation job.
217
218
Parameters:
219
- target_column_name: Name of the target column
220
- training_data: Training dataset with images and segmentation masks
221
- validation_data: Validation dataset (optional)
222
- primary_metric: Primary metric for optimization
223
224
Returns:
225
ImageInstanceSegmentationJob configured for automated ML
226
"""
227
```
228
229
### Natural Language Processing AutoML
230
231
Automated ML for text-based tasks including classification and named entity recognition.
232
233
```python { .api }
234
def text_classification(
235
*,
236
target_column_name: str,
237
training_data: Data,
238
validation_data: Data = None,
239
primary_metric: str = "accuracy",
240
featurization: NlpFeaturizationSettings = None,
241
limits: NlpLimitSettings = None,
242
sweep_settings: NlpSweepSettings = None,
243
**kwargs
244
) -> TextClassificationJob:
245
"""
246
Create an automated text classification job.
247
248
Parameters:
249
- target_column_name: Name of the target column
250
- training_data: Training dataset with text and labels
251
- validation_data: Validation dataset (optional)
252
- primary_metric: Primary metric for optimization
253
- featurization: NLP feature engineering settings
254
- limits: Training limits and constraints
255
- sweep_settings: Hyperparameter sweep settings
256
257
Returns:
258
TextClassificationJob configured for automated ML
259
"""
260
261
def text_classification_multilabel(
262
*,
263
target_column_name: str,
264
training_data: Data,
265
validation_data: Data = None,
266
primary_metric: str = "accuracy",
267
**kwargs
268
) -> TextClassificationMultilabelJob:
269
"""
270
Create an automated multi-label text classification job.
271
272
Parameters:
273
- target_column_name: Name of the target column
274
- training_data: Training dataset
275
- validation_data: Validation dataset (optional)
276
- primary_metric: Primary metric for optimization
277
278
Returns:
279
TextClassificationMultilabelJob configured for automated ML
280
"""
281
282
def text_ner(
283
*,
284
target_column_name: str,
285
training_data: Data,
286
validation_data: Data = None,
287
primary_metric: str = "accuracy",
288
**kwargs
289
) -> TextNerJob:
290
"""
291
Create an automated named entity recognition job.
292
293
Parameters:
294
- target_column_name: Name of the target column
295
- training_data: Training dataset with text and entity labels
296
- validation_data: Validation dataset (optional)
297
- primary_metric: Primary metric for optimization
298
299
Returns:
300
TextNerJob configured for automated ML
301
"""
302
```
303
304
### AutoML Job Classes
305
306
```python { .api }
307
class ClassificationJob:
308
def __init__(
309
self,
310
*,
311
target_column_name: str,
312
training_data: Data,
313
primary_metric: str = "accuracy",
314
**kwargs
315
):
316
"""Tabular classification AutoML job."""
317
318
class RegressionJob:
319
def __init__(
320
self,
321
*,
322
target_column_name: str,
323
training_data: Data,
324
primary_metric: str = "normalized_root_mean_squared_error",
325
**kwargs
326
):
327
"""Tabular regression AutoML job."""
328
329
class ForecastingJob:
330
def __init__(
331
self,
332
*,
333
target_column_name: str,
334
training_data: Data,
335
forecasting_settings: ForecastingSettings,
336
primary_metric: str = "normalized_root_mean_squared_error",
337
**kwargs
338
):
339
"""Time series forecasting AutoML job."""
340
341
class ImageClassificationJob:
342
def __init__(
343
self,
344
*,
345
target_column_name: str,
346
training_data: Data,
347
primary_metric: str = "accuracy",
348
**kwargs
349
):
350
"""Image classification AutoML job."""
351
352
class TextClassificationJob:
353
def __init__(
354
self,
355
*,
356
target_column_name: str,
357
training_data: Data,
358
primary_metric: str = "accuracy",
359
**kwargs
360
):
361
"""Text classification AutoML job."""
362
```
363
364
### Configuration Classes
365
366
```python { .api }
367
class TrainingSettings:
368
def __init__(
369
self,
370
*,
371
enable_onnx_compatible_models: bool = False,
372
enable_dnn_training: bool = False,
373
enable_model_explainability: bool = True,
374
enable_stack_ensemble: bool = True,
375
enable_vote_ensemble: bool = True,
376
stack_ensemble_settings: StackEnsembleSettings = None,
377
blocked_training_algorithms: list = None,
378
allowed_training_algorithms: list = None
379
):
380
"""
381
Training configuration for AutoML jobs.
382
383
Parameters:
384
- enable_onnx_compatible_models: Enable ONNX model generation
385
- enable_dnn_training: Enable deep neural network training
386
- enable_model_explainability: Enable model explanations
387
- enable_stack_ensemble: Enable stack ensemble models
388
- enable_vote_ensemble: Enable vote ensemble models
389
- stack_ensemble_settings: Stack ensemble configuration
390
- blocked_training_algorithms: Algorithms to exclude
391
- allowed_training_algorithms: Algorithms to include
392
"""
393
394
class TabularFeaturizationSettings:
395
def __init__(
396
self,
397
*,
398
mode: str = "auto",
399
transformer_params: dict = None,
400
column_name_and_types: dict = None,
401
dataset_language: str = "eng",
402
blocked_transformers: list = None
403
):
404
"""
405
Feature engineering settings for tabular data.
406
407
Parameters:
408
- mode: Featurization mode ("auto", "custom", "off")
409
- transformer_params: Custom transformer parameters
410
- column_name_and_types: Column data types
411
- dataset_language: Dataset language for text features
412
- blocked_transformers: Transformers to exclude
413
"""
414
415
class TabularLimitSettings:
416
def __init__(
417
self,
418
*,
419
max_trials: int = 1000,
420
max_concurrent_trials: int = None,
421
max_cores_per_trial: int = None,
422
trial_timeout_minutes: int = None,
423
experiment_timeout_minutes: int = None,
424
enable_early_termination: bool = True
425
):
426
"""
427
Training limits for tabular AutoML.
428
429
Parameters:
430
- max_trials: Maximum number of trials
431
- max_concurrent_trials: Maximum concurrent trials
432
- max_cores_per_trial: Maximum cores per trial
433
- trial_timeout_minutes: Timeout per trial in minutes
434
- experiment_timeout_minutes: Total experiment timeout
435
- enable_early_termination: Enable early stopping
436
"""
437
438
class ForecastingSettings:
439
def __init__(
440
self,
441
*,
442
time_column_name: str,
443
forecast_horizon: int,
444
time_series_id_column_names: list = None,
445
frequency: str = None,
446
target_lags: list = None,
447
target_rolling_window_size: int = None,
448
country_or_region_for_holidays: str = None,
449
use_stl: str = None
450
):
451
"""
452
Time series forecasting specific settings.
453
454
Parameters:
455
- time_column_name: Name of the time column
456
- forecast_horizon: Number of periods to forecast
457
- time_series_id_column_names: Columns identifying time series
458
- frequency: Data frequency (D, H, M, etc.)
459
- target_lags: Lag values for target variable
460
- target_rolling_window_size: Rolling window size
461
- country_or_region_for_holidays: Holiday calendar region
462
- use_stl: STL decomposition usage
463
"""
464
465
class ImageLimitSettings:
466
def __init__(
467
self,
468
*,
469
max_trials: int = 1,
470
max_concurrent_trials: int = 1,
471
timeout_minutes: int = None
472
):
473
"""
474
Training limits for image AutoML.
475
476
Parameters:
477
- max_trials: Maximum number of trials
478
- max_concurrent_trials: Maximum concurrent trials
479
- timeout_minutes: Total timeout in minutes
480
"""
481
482
class NlpLimitSettings:
483
def __init__(
484
self,
485
*,
486
max_trials: int = 1,
487
max_concurrent_trials: int = 1,
488
timeout_minutes: int = None
489
):
490
"""
491
Training limits for NLP AutoML.
492
493
Parameters:
494
- max_trials: Maximum number of trials
495
- max_concurrent_trials: Maximum concurrent trials
496
- timeout_minutes: Total timeout in minutes
497
"""
498
```
499
500
#### Usage Example
501
502
```python
503
from azure.ai.ml import automl
504
from azure.ai.ml.entities import Data
505
from azure.ai.ml.automl import TabularLimitSettings, TrainingSettings
506
507
# Configure AutoML settings
508
limits = TabularLimitSettings(
509
max_trials=10,
510
max_concurrent_trials=2,
511
trial_timeout_minutes=30,
512
experiment_timeout_minutes=180
513
)
514
515
training_settings = TrainingSettings(
516
enable_onnx_compatible_models=True,
517
enable_model_explainability=True,
518
enable_stack_ensemble=True
519
)
520
521
# Create regression job with settings
522
regression_job = automl.regression(
523
target_column_name="price",
524
training_data=training_data,
525
primary_metric="r2_score",
526
limits=limits,
527
training_settings=training_settings,
528
compute="cpu-cluster"
529
)
530
531
# Submit job
532
submitted_job = ml_client.jobs.create_or_update(regression_job)
533
print(f"AutoML job submitted: {submitted_job.name}")
534
```
535
536
## Primary Metrics
537
538
Available primary metrics for different AutoML task types:
539
540
### Classification Metrics
541
- `accuracy` - Overall accuracy
542
- `AUC_weighted` - Area under ROC curve (weighted)
543
- `average_precision_score_weighted` - Average precision score
544
- `precision_score_weighted` - Precision score (weighted)
545
- `recall_score_weighted` - Recall score (weighted)
546
547
### Regression Metrics
548
- `normalized_root_mean_squared_error` - Normalized RMSE
549
- `r2_score` - R-squared score
550
- `mean_absolute_error` - Mean absolute error
551
- `normalized_mean_absolute_error` - Normalized MAE
552
- `spearman_correlation` - Spearman correlation
553
554
### Forecasting Metrics
555
- `normalized_root_mean_squared_error` - Normalized RMSE
556
- `r2_score` - R-squared score
557
- `mean_absolute_error` - Mean absolute error
558
- `normalized_mean_absolute_error` - Normalized MAE
559
560
### Computer Vision Metrics
561
- `accuracy` - Classification accuracy
562
- `mean_average_precision` - Object detection/segmentation mAP
563
- `iou` - Intersection over Union