0
# Configuration and Presets
1
2
AutoGluon Tabular provides extensive configuration options through presets, hyperparameter configurations, and feature processing settings. These configurations enable users to optimize for different objectives like accuracy, speed, interpretability, or deployment constraints.
3
4
## Capabilities
5
6
### Preset Configurations
7
8
Pre-configured settings optimized for different use cases, balancing accuracy, training time, and computational resources.
9
10
```python { .api }
11
# Available preset configurations
12
PRESET_CONFIGURATIONS = Literal[
13
"best_quality", # Maximum accuracy, longer training time
14
"high_quality", # High accuracy with fast inference
15
"good_quality", # Good accuracy with very fast inference
16
"medium_quality", # Medium accuracy, very fast training (default)
17
"optimize_for_deployment", # Optimizes for deployment by cleaning up models
18
"interpretable" # Interpretable models only
19
]
20
21
def get_preset_config(preset: str) -> dict:
22
"""
23
Get configuration dictionary for a specific preset.
24
25
Parameters:
26
- preset: Name of the preset configuration
27
28
Returns:
29
Dictionary with preset configuration parameters
30
"""
31
```
32
33
### Hyperparameter Configurations
34
35
Systematic hyperparameter configuration system for customizing model training and optimization strategies.
36
37
```python { .api }
38
def get_hyperparameter_config(
39
preset: str = None,
40
model_types: list[str] = None,
41
search_strategy: str = "auto"
42
) -> dict:
43
"""
44
Generate hyperparameter configuration for specified models and preset.
45
46
Parameters:
47
- preset: Base preset configuration
48
- model_types: List of model types to configure
49
- search_strategy: Hyperparameter search strategy ('grid', 'random', 'bayesian', 'auto')
50
51
Returns:
52
Dictionary mapping model names to hyperparameter configurations
53
"""
54
55
# Hyperparameter configuration structure
56
HYPERPARAMETER_CONFIG = dict[str, dict[str, Any]]
57
# Example: {'LGB': {'num_leaves': [31, 127], 'learning_rate': [0.01, 0.1]}}
58
59
def get_hyperparameter_config_options() -> list[str]:
60
"""
61
Get list of available hyperparameter configuration presets.
62
63
Returns:
64
List of available configuration names
65
"""
66
67
def get_hyperparameter_config(config_name: str) -> dict:
68
"""
69
Get specific hyperparameter configuration by name.
70
71
Parameters:
72
- config_name: Name of the hyperparameter configuration preset
73
74
Returns:
75
Hyperparameter configuration dictionary
76
"""
77
```
78
79
### Feature Generation Configuration
80
81
Automated feature engineering and preprocessing configuration system for handling diverse data types and feature transformations.
82
83
```python { .api }
84
def get_default_feature_generator(
85
feature_generator: str = "auto",
86
feature_metadata: 'FeatureMetadata' = None,
87
init_kwargs: dict = None
88
) -> 'AutoMLPipelineFeatureGenerator':
89
"""
90
Get default feature generator with specified configuration.
91
92
Parameters:
93
- feature_generator: Feature generation preset ('auto', 'interpretable')
94
- feature_metadata: Metadata for feature processing
95
- init_kwargs: Additional initialization arguments
96
97
Returns:
98
Configured feature generator instance
99
"""
100
101
class FeatureGenerator:
102
"""Base class for feature generation and preprocessing."""
103
104
def fit_transform(
105
self,
106
X: pd.DataFrame,
107
feature_metadata: 'FeatureMetadata' = None,
108
**kwargs
109
) -> pd.DataFrame:
110
"""
111
Fit feature generator and transform input data.
112
113
Parameters:
114
- X: Input dataframe
115
- feature_metadata: Feature type metadata
116
117
Returns:
118
Transformed feature dataframe
119
"""
120
121
def transform(self, X: pd.DataFrame) -> pd.DataFrame:
122
"""Transform input data using fitted generator."""
123
```
124
125
### Advanced Training Arguments
126
127
Configuration options for advanced training strategies including bagging, stacking, and resource management.
128
129
```python { .api }
130
class AGArgsFit:
131
"""Arguments for controlling model fitting behavior."""
132
133
num_cpus: int = "auto" # CPU cores for training
134
num_gpus: int = 0 # GPU devices to use
135
memory_limit: int = None # Memory limit in MB
136
disk_limit: int = None # Disk space limit in MB
137
time_limit: float = None # Time limit per model in seconds
138
name_suffix: str = "" # Suffix for model names
139
priority: int = 0 # Training priority
140
141
class AGArgsEnsemble:
142
"""Arguments for controlling ensemble behavior."""
143
144
fold_fitting_strategy: str = "sequential_local" # Fold fitting strategy
145
auto_stack: bool = True # Enable automatic stacking
146
bagging_mode: str = "oob" # Bagging validation mode
147
stack_mode: str = "infer" # Stacking mode
148
ensemble_size_max: int = 25 # Maximum ensemble size
149
150
# Training configuration structure
151
TRAINING_CONFIG = {
152
'num_bag_folds': int, # Number of bagging folds (default: auto)
153
'num_bag_sets': int, # Number of bagging sets (default: auto)
154
'num_stack_levels': int, # Number of stacking levels (default: auto)
155
'ag_args_fit': dict, # Advanced fitting arguments
156
'ag_args_ensemble': dict, # Advanced ensemble arguments
157
}
158
```
159
160
### Evaluation and Metric Configuration
161
162
Configuration for evaluation metrics, validation strategies, and performance measurement.
163
164
```python { .api }
165
# Classification metrics
166
CLASSIFICATION_METRICS = [
167
"accuracy", "balanced_accuracy", "log_loss",
168
"f1", "f1_macro", "f1_micro", "f1_weighted",
169
"roc_auc", "roc_auc_ovo", "roc_auc_ovo_macro", "roc_auc_ovo_weighted",
170
"roc_auc_ovr", "roc_auc_ovr_macro", "roc_auc_ovr_micro", "roc_auc_ovr_weighted",
171
"average_precision", "precision", "precision_macro", "precision_micro", "precision_weighted",
172
"recall", "recall_macro", "recall_micro", "recall_weighted",
173
"mcc", "pac_score"
174
]
175
176
# Regression metrics
177
REGRESSION_METRICS = [
178
"root_mean_squared_error", "mean_squared_error", "mean_absolute_error",
179
"median_absolute_error", "mean_absolute_percentage_error",
180
"r2", "symmetric_mean_absolute_percentage_error"
181
]
182
183
# Quantile regression metrics
184
QUANTILE_METRICS = ["pinball_loss"]
185
186
def get_metric_config(
187
problem_type: str,
188
eval_metric: str = None,
189
greater_is_better: bool = None
190
) -> dict:
191
"""
192
Get metric configuration for evaluation.
193
194
Parameters:
195
- problem_type: Type of ML problem
196
- eval_metric: Primary evaluation metric
197
- greater_is_better: Whether higher metric values are better
198
199
Returns:
200
Metric configuration dictionary
201
"""
202
```
203
204
### Resource and Performance Configuration
205
206
Settings for optimizing computational resource usage, memory management, and training performance.
207
208
```python { .api }
209
class ResourceConfig:
210
"""Configuration for computational resources and performance optimization."""
211
212
# CPU and Memory
213
num_cpus: int = "auto" # Number of CPU cores
214
memory_limit_mb: int = None # Memory limit in megabytes
215
216
# GPU Configuration
217
num_gpus: int = 0 # Number of GPU devices
218
gpu_memory_limit: int = None # GPU memory limit
219
220
# Disk and Storage
221
disk_limit_mb: int = None # Disk space limit
222
cache_data: bool = True # Cache preprocessed data
223
224
# Performance Optimization
225
enable_multiprocessing: bool = True # Enable multiprocessing
226
max_concurrent_models: int = 1 # Maximum concurrent model training
227
early_stopping_rounds: int = None # Early stopping configuration
228
229
# Inference Optimization
230
optimize_for_deployment: bool = False # Optimize for deployment
231
model_compression: bool = False # Enable model compression
232
```
233
234
## Usage Examples
235
236
### Basic Preset Usage
237
238
```python
239
from autogluon.tabular import TabularPredictor
240
import pandas as pd
241
242
# Load data
243
train_data = pd.read_csv('train.csv')
244
test_data = pd.read_csv('test.csv')
245
246
# Different preset configurations
247
presets = ['good_quality', 'best_quality', 'optimize_for_deployment', 'interpretable']
248
249
results = {}
250
for preset in presets:
251
print(f"\nTraining with preset: {preset}")
252
253
predictor = TabularPredictor(
254
label='target',
255
path=f'./models_{preset}/'
256
)
257
258
predictor.fit(
259
train_data,
260
presets=preset,
261
time_limit=600 # 10 minutes per preset
262
)
263
264
# Evaluate performance
265
performance = predictor.evaluate(test_data)
266
leaderboard = predictor.leaderboard(test_data)
267
268
results[preset] = {
269
'score': performance,
270
'best_model': leaderboard.iloc[0]['model'],
271
'num_models': len(leaderboard)
272
}
273
274
print(f"Best score: {performance}")
275
print(f"Best model: {results[preset]['best_model']}")
276
print(f"Total models trained: {results[preset]['num_models']}")
277
278
# Compare results
279
print("\nPreset Comparison:")
280
for preset, result in results.items():
281
print(f"{preset}: {result['score']:.4f} ({result['num_models']} models)")
282
```
283
284
### Custom Hyperparameter Configuration
285
286
```python
287
from autogluon.tabular import TabularPredictor
288
289
# Advanced hyperparameter configuration
290
hyperparameters = {
291
# Gradient Boosting Models
292
'LGB': [
293
# Fast configuration
294
{
295
'num_leaves': 31,
296
'learning_rate': 0.1,
297
'feature_fraction': 0.9,
298
'bagging_fraction': 0.8,
299
'bagging_freq': 5,
300
'min_data_in_leaf': 20,
301
'objective': 'binary',
302
'max_depth': -1,
303
'save_binary': True,
304
'ag_args': {'name_suffix': '_Fast', 'priority': 1}
305
},
306
# Accurate configuration
307
{
308
'num_leaves': 127,
309
'learning_rate': 0.05,
310
'feature_fraction': 0.8,
311
'bagging_fraction': 0.9,
312
'bagging_freq': 5,
313
'min_data_in_leaf': 10,
314
'reg_alpha': 0.1,
315
'reg_lambda': 0.1,
316
'ag_args': {'name_suffix': '_Accurate', 'priority': 2}
317
}
318
],
319
320
'XGB': {
321
'n_estimators': [100, 300, 500],
322
'max_depth': [3, 6, 10],
323
'learning_rate': [0.01, 0.1, 0.2],
324
'subsample': [0.8, 0.9, 1.0],
325
'colsample_bytree': [0.8, 0.9, 1.0],
326
'reg_alpha': [0, 0.1, 1],
327
'reg_lambda': [0, 0.1, 1]
328
},
329
330
# Neural Networks
331
'NN_TORCH': [
332
# Small network
333
{
334
'num_epochs': 50,
335
'learning_rate': 0.001,
336
'weight_decay': 1e-4,
337
'dropout_prob': 0.1,
338
'embedding_size_factor': 1.0,
339
'ag_args': {'name_suffix': '_Small'}
340
},
341
# Large network
342
{
343
'num_epochs': 100,
344
'learning_rate': 0.0005,
345
'weight_decay': 1e-5,
346
'dropout_prob': 0.2,
347
'embedding_size_factor': 2.0,
348
'ag_args': {'name_suffix': '_Large'}
349
}
350
]
351
}
352
353
# Train with custom hyperparameters
354
predictor = TabularPredictor(label='target')
355
predictor.fit(
356
train_data,
357
hyperparameters=hyperparameters,
358
time_limit=1800, # 30 minutes
359
num_bag_folds=5,
360
num_stack_levels=2
361
)
362
```
363
364
### Advanced Training Configuration
365
366
```python
367
from autogluon.tabular import TabularPredictor
368
369
# Advanced training arguments
370
ag_args_fit = {
371
'num_cpus': 8, # Use 8 CPU cores
372
'num_gpus': 1, # Use 1 GPU
373
'memory_limit': 16000, # 16GB memory limit
374
'time_limit': 300, # 5 minutes per model
375
}
376
377
ag_args_ensemble = {
378
'fold_fitting_strategy': 'sequential_local',
379
'auto_stack': True,
380
'bagging_mode': 'oob', # Out-of-bag validation
381
'stack_mode': 'infer',
382
'ensemble_size_max': 50 # Maximum ensemble size
383
}
384
385
# Feature generation configuration
386
feature_generator_kwargs = {
387
'enable_raw_text_features': True,
388
'enable_nlp_features': True,
389
'text_ngram_size': 300,
390
'text_special_features': ['word_count', 'char_count']
391
}
392
393
predictor = TabularPredictor(
394
label='target',
395
eval_metric='roc_auc',
396
sample_weight='sample_weights'
397
)
398
399
predictor.fit(
400
train_data,
401
tuning_data=validation_data,
402
time_limit=3600, # 1 hour total
403
presets='best_quality',
404
405
# Advanced configurations
406
ag_args_fit=ag_args_fit,
407
ag_args_ensemble=ag_args_ensemble,
408
feature_generator_kwargs=feature_generator_kwargs,
409
410
# Bagging and stacking
411
num_bag_folds=10,
412
num_bag_sets=3,
413
num_stack_levels=3,
414
415
# Model selection
416
excluded_model_types=['KNN'], # Exclude slow models
417
418
# Hyperparameter tuning
419
hyperparameter_tune_kwargs={
420
'scheduler': 'local',
421
'searcher': 'bayesopt',
422
'num_trials': 100
423
}
424
)
425
```
426
427
### Deployment Optimization Configuration
428
429
```python
430
from autogluon.tabular import TabularPredictor
431
432
# Configuration optimized for deployment
433
deployment_hyperparameters = {
434
'LGB': {
435
'num_leaves': 31, # Smaller trees
436
'max_depth': 6,
437
'min_data_in_leaf': 50, # Regularization
438
'bagging_freq': 0, # Disable bagging for speed
439
'feature_fraction': 1.0, # Use all features
440
},
441
'CAT': {
442
'iterations': 100, # Fewer iterations
443
'depth': 6,
444
'l2_leaf_reg': 3,
445
'bootstrap_type': 'No' # Disable bootstrap
446
}
447
}
448
449
predictor = TabularPredictor(
450
label='target',
451
path='./deployment_model/'
452
)
453
454
predictor.fit(
455
train_data,
456
presets='optimize_for_deployment',
457
hyperparameters=deployment_hyperparameters,
458
time_limit=300, # Fast training
459
num_bag_folds=0, # Disable bagging
460
num_stack_levels=0, # Disable stacking
461
462
# Focus on fast, simple models
463
included_model_types=['LGB', 'CAT', 'LR']
464
)
465
466
# Create deployment-optimized clone
467
deployment_predictor = predictor.clone_for_deployment(
468
path='./deployment_ready/',
469
model='best' # Single best model only
470
)
471
472
# Test inference speed
473
import time
474
start_time = time.time()
475
predictions = deployment_predictor.predict(test_data)
476
inference_time = time.time() - start_time
477
478
print(f"Inference time: {inference_time:.3f} seconds")
479
print(f"Predictions per second: {len(test_data) / inference_time:.0f}")
480
```
481
482
### Interpretable Model Configuration
483
484
```python
485
from autogluon.tabular import TabularPredictor
486
487
# Configuration for interpretable models
488
interpretable_hyperparameters = {
489
'LR': { # Logistic Regression
490
'C': [0.01, 0.1, 1.0, 10], # Regularization
491
'penalty': ['l1', 'l2'],
492
'solver': ['liblinear', 'saga']
493
},
494
'RF': { # Random Forest
495
'n_estimators': [50, 100, 200],
496
'max_depth': [3, 5, 10], # Limit depth for interpretability
497
'min_samples_split': [10, 20, 50],
498
'max_features': ['sqrt', 'log2']
499
},
500
'XGB': { # XGBoost (regularized)
501
'n_estimators': [50, 100],
502
'max_depth': [3, 4, 5], # Shallow trees
503
'learning_rate': [0.1, 0.2],
504
'reg_alpha': [0.1, 1.0], # L1 regularization
505
'reg_lambda': [0.1, 1.0] # L2 regularization
506
}
507
}
508
509
predictor = TabularPredictor(
510
label='target',
511
eval_metric='accuracy'
512
)
513
514
predictor.fit(
515
train_data,
516
presets='interpretable',
517
hyperparameters=interpretable_hyperparameters,
518
519
# Enable only interpretable models
520
included_model_types=['LR', 'RF', 'XGB'],
521
522
# Simpler ensemble strategies
523
num_bag_folds=3,
524
num_stack_levels=1,
525
526
# Feature processing for interpretability
527
feature_generator='auto' # Minimal feature engineering
528
)
529
530
# Analyze model interpretability
531
leaderboard = predictor.leaderboard(extra_info=True)
532
print("Interpretable models ranking:")
533
print(leaderboard[['model', 'score_val', 'fit_time']].head())
534
```
535
536
## Configuration Reference
537
538
### Preset Details
539
540
| Preset | Training Time | Model Diversity | Ensembling | Best For |
541
|--------|---------------|-----------------|------------|----------|
542
| `medium_quality` | Low | Medium | None | Quick prototyping, default preset |
543
| `good_quality` | Medium | High | Moderate | General use, balanced performance |
544
| `high_quality` | High | High | Extensive | High accuracy with fast inference |
545
| `best_quality` | Very High | Very High | Extensive | Maximum accuracy, competitions |
546
| `optimize_for_deployment` | - | - | - | Post-training optimization |
547
| `interpretable` | Low | Limited | Simple | Regulated industries, explainability |
548
549
### Model Type Abbreviations
550
551
| Code | Full Name | Category |
552
|------|-----------|----------|
553
| `LGB` | LightGBM | Gradient Boosting |
554
| `XGB` | XGBoost | Gradient Boosting |
555
| `CAT` | CatBoost | Gradient Boosting |
556
| `RF` | Random Forest | Tree Ensemble |
557
| `XT` | Extra Trees | Tree Ensemble |
558
| `LR` | Linear/Logistic Regression | Linear |
559
| `KNN` | K-Nearest Neighbors | Instance-based |
560
| `NN_TORCH` | PyTorch Neural Network | Deep Learning |
561
| `FASTAI` | FastAI Neural Network | Deep Learning |
562
| `TABPFN` | TabPFN | Foundation Model |
563
564
### Resource Configuration Guidelines
565
566
| Use Case | CPU Cores | Memory (GB) | Time Limit | Bag Folds |
567
|----------|-----------|-------------|------------|-----------|
568
| Quick Prototype | 2-4 | 4-8 | 5-15 min | 2-3 |
569
| Production Model | 8-16 | 16-32 | 30-60 min | 5-10 |
570
| Competition | 16-32 | 32-64 | 2-8 hours | 10-20 |
571
| Large Dataset | 16+ | 64+ | 4+ hours | 5-10 |