0
# Models and Registry
1
2
AutoGluon Tabular provides a comprehensive collection of machine learning models with unified interfaces, spanning from traditional algorithms to modern deep learning approaches. The model registry system enables extensibility and customization of the available model portfolio.
3
4
## Capabilities
5
6
### Core Machine Learning Models
7
8
Traditional and gradient boosting models that form the backbone of AutoGluon's automated machine learning capabilities, providing robust performance across diverse tabular datasets.
9
10
```python { .api }
11
# Gradient Boosting Models
12
class LGBModel:
13
"""LightGBM gradient boosting model optimized for speed and memory efficiency."""
14
15
class XGBoostModel:
16
"""XGBoost gradient boosting model with advanced regularization and handling of missing values."""
17
18
class CatBoostModel:
19
"""CatBoost gradient boosting model with native categorical feature support."""
20
21
# Tree-based Models
22
class RFModel:
23
"""Random Forest model providing ensemble of decision trees with feature bagging."""
24
25
class XTModel:
26
"""Extra Trees (Extremely Randomized Trees) model with increased randomization."""
27
28
# Linear Models
29
class LinearModel:
30
"""Linear/Logistic Regression with automatic regularization and feature scaling."""
31
32
# Instance-based Models
33
class KNNModel:
34
"""K-Nearest Neighbors model for both classification and regression tasks."""
35
```
36
37
### Neural Network Models
38
39
Deep learning models optimized for tabular data with automatic architecture selection, hyperparameter optimization, and specialized architectures for structured data.
40
41
```python { .api }
42
# Traditional Neural Networks
43
class NNFastAiTabularModel:
44
"""FastAI-based neural network with automated preprocessing and training."""
45
46
class TabularNeuralNetTorchModel:
47
"""PyTorch-based neural network with custom architecture for tabular data."""
48
49
# Transformer-based Models
50
class FTTransformerModel:
51
"""Feature Tokenizer Transformer - specialized transformer architecture for tabular data."""
52
53
# Pre-trained Foundation Models
54
class TabPFNV2Model:
55
"""TabPFN v2 - pre-trained transformer model fine-tuned for tabular prediction."""
56
57
class TabPFNMixModel:
58
"""TabPFN Mix - ensemble of pre-trained transformers for improved performance."""
59
60
class MitraModel:
61
"""Mitra - advanced transformer architecture optimized for tabular classification."""
62
63
# Specialized Neural Networks
64
class TabMModel:
65
"""TabM - neural network with attention mechanisms for tabular data."""
66
67
class RealMLPModel:
68
"""Real-valued MLP with specialized training procedures for tabular prediction."""
69
70
class TabICLModel:
71
"""TabICL - in-context learning model for few-shot tabular prediction."""
72
```
73
74
### Multi-Modal Models
75
76
Models capable of handling mixed data types including text, images, and structured features within the same prediction task.
77
78
```python { .api }
79
class MultiModalPredictorModel:
80
"""
81
AutoMM-based multi-modal model handling tabular, text, and image features.
82
Automatically detects and processes different data modalities.
83
"""
84
85
class TextPredictorModel:
86
"""Specialized model for tabular data containing text features."""
87
88
class FastTextModel:
89
"""FastText model for efficient text classification and representation learning."""
90
91
class ImagePredictorModel:
92
"""Model for tabular data with image features or image-based prediction."""
93
```
94
95
### Interpretable Models
96
97
Models designed for interpretability and explainability, providing transparent decision-making processes suitable for regulated industries and high-stakes applications.
98
99
```python { .api }
100
# Base Interpretable Model
101
class _IModelsModel:
102
"""Base class for interpretable machine learning models."""
103
104
# Rule-based Models
105
class BoostedRulesModel:
106
"""Gradient-boosted rule ensemble providing interpretable decision rules."""
107
108
class RuleFitModel:
109
"""RuleFit model combining linear regression with decision rules."""
110
111
class FigsModel:
112
"""FIGS (Fast Interpretable Greedy-tree Sums) model for rule-based predictions."""
113
114
# Tree-based Interpretable Models
115
class GreedyTreeModel:
116
"""Greedy decision tree optimized for interpretability over accuracy."""
117
118
class HSTreeModel:
119
"""Hierarchical Shrinkage Tree with built-in regularization."""
120
121
# Text Models
122
class FastTextModel:
123
"""FastText model for text classification in tabular datasets."""
124
```
125
126
### Model Registry System
127
128
Extensible registry system for managing, registering, and accessing machine learning models within AutoGluon's framework.
129
130
```python { .api }
131
class ModelRegistry:
132
"""
133
Registry for managing available machine learning models.
134
Enables custom model registration and retrieval.
135
"""
136
137
def __init__(self):
138
"""Initialize empty model registry."""
139
140
def register_model(
141
self,
142
name: str,
143
model_class: type,
144
tags: list[str] = None
145
) -> None:
146
"""
147
Register a new model class in the registry.
148
149
Parameters:
150
- name: Unique identifier for the model
151
- model_class: Model class to register
152
- tags: Optional tags for categorization
153
"""
154
155
def get_model(self, name: str) -> type:
156
"""
157
Retrieve a registered model class by name.
158
159
Parameters:
160
- name: Name of the registered model
161
162
Returns:
163
Model class
164
"""
165
166
def list_models(self, tags: list[str] = None) -> list[str]:
167
"""
168
List all registered model names.
169
170
Parameters:
171
- tags: Filter by tags (optional)
172
173
Returns:
174
List of model names
175
"""
176
177
def unregister_model(self, name: str) -> None:
178
"""
179
Remove a model from the registry.
180
181
Parameters:
182
- name: Name of the model to remove
183
"""
184
185
# Global model registry instance
186
ag_model_registry: ModelRegistry
187
```
188
189
### Base Model Interface
190
191
Abstract base class defining the common interface that all AutoGluon models must implement for consistent behavior and integration.
192
193
```python { .api }
194
class AbstractModel:
195
"""
196
Abstract base class for all AutoGluon tabular models.
197
Defines the standard interface and common functionality.
198
"""
199
200
def __init__(
201
self,
202
problem_type: str,
203
objective: str = None,
204
**kwargs
205
):
206
"""
207
Initialize model with problem configuration.
208
209
Parameters:
210
- problem_type: Type of ML problem ('binary', 'multiclass', 'regression')
211
- objective: Optimization objective/metric
212
- kwargs: Model-specific parameters
213
"""
214
215
def fit(
216
self,
217
X_train: pd.DataFrame,
218
y_train: pd.Series,
219
X_val: pd.DataFrame = None,
220
y_val: pd.Series = None,
221
**kwargs
222
) -> None:
223
"""
224
Train the model on provided data.
225
226
Parameters:
227
- X_train: Training features
228
- y_train: Training labels
229
- X_val: Validation features (optional)
230
- y_val: Validation labels (optional)
231
"""
232
233
def predict(self, X: pd.DataFrame, **kwargs) -> np.ndarray:
234
"""
235
Generate predictions for input data.
236
237
Parameters:
238
- X: Input features
239
240
Returns:
241
Predictions as numpy array
242
"""
243
244
def predict_proba(self, X: pd.DataFrame, **kwargs) -> np.ndarray:
245
"""
246
Generate prediction probabilities (classification only).
247
248
Parameters:
249
- X: Input features
250
251
Returns:
252
Prediction probabilities as numpy array
253
"""
254
255
def get_memory_size(self) -> int:
256
"""
257
Get approximate memory usage of the model in bytes.
258
259
Returns:
260
Memory usage in bytes
261
"""
262
263
def save(self, path: str) -> None:
264
"""
265
Save model to disk.
266
267
Parameters:
268
- path: File path for saving
269
"""
270
271
def load(self, path: str) -> None:
272
"""
273
Load model from disk.
274
275
Parameters:
276
- path: File path for loading
277
"""
278
```
279
280
## Usage Examples
281
282
### Custom Model Registration
283
284
```python
285
from autogluon.tabular.models import AbstractModel
286
from autogluon.tabular.registry import ag_model_registry
287
import pandas as pd
288
import numpy as np
289
from sklearn.ensemble import GradientBoostingClassifier
290
291
class CustomGBModel(AbstractModel):
292
"""Custom Gradient Boosting model wrapper."""
293
294
def __init__(self, **kwargs):
295
super().__init__(**kwargs)
296
self.model = GradientBoostingClassifier(
297
n_estimators=kwargs.get('n_estimators', 100),
298
learning_rate=kwargs.get('learning_rate', 0.1),
299
random_state=42
300
)
301
302
def fit(self, X_train, y_train, **kwargs):
303
self.model.fit(X_train, y_train)
304
305
def predict(self, X):
306
return self.model.predict(X)
307
308
def predict_proba(self, X):
309
return self.model.predict_proba(X)
310
311
# Register custom model
312
ag_model_registry.register_model(
313
name='CustomGB',
314
model_class=CustomGBModel,
315
tags=['tree', 'gradient_boosting', 'custom']
316
)
317
318
# Use in TabularPredictor
319
from autogluon.tabular import TabularPredictor
320
321
predictor = TabularPredictor(label='target')
322
predictor.fit(
323
train_data,
324
hyperparameters={'CustomGB': {'n_estimators': [50, 100, 200]}}
325
)
326
```
327
328
### Model-Specific Hyperparameter Tuning
329
330
```python
331
from autogluon.tabular import TabularPredictor
332
333
# Define model-specific hyperparameters
334
hyperparameters = {
335
# LightGBM configurations
336
'LGB': {
337
'num_leaves': [31, 127, 255],
338
'learning_rate': [0.01, 0.05, 0.1],
339
'feature_fraction': [0.8, 0.9, 1.0],
340
'bagging_fraction': [0.8, 0.9, 1.0],
341
'min_data_in_leaf': [10, 20, 50]
342
},
343
344
# XGBoost configurations
345
'XGB': {
346
'n_estimators': [100, 300, 500],
347
'max_depth': [3, 6, 10],
348
'learning_rate': [0.01, 0.1, 0.2],
349
'subsample': [0.8, 0.9, 1.0],
350
'colsample_bytree': [0.8, 0.9, 1.0]
351
},
352
353
# CatBoost configurations
354
'CAT': {
355
'iterations': [100, 500, 1000],
356
'depth': [4, 6, 8],
357
'learning_rate': [0.01, 0.1, 0.2],
358
'l2_leaf_reg': [1, 3, 5, 7, 9]
359
},
360
361
# Neural Network configurations
362
'NN_TORCH': {
363
'num_epochs': [10, 50, 100],
364
'learning_rate': [1e-4, 1e-3, 1e-2],
365
'weight_decay': [1e-6, 1e-4, 1e-2],
366
'dropout_prob': [0.0, 0.1, 0.2, 0.5]
367
}
368
}
369
370
predictor = TabularPredictor(label='target')
371
predictor.fit(
372
train_data,
373
hyperparameters=hyperparameters,
374
time_limit=1800 # 30 minutes
375
)
376
377
# Check which models were trained
378
leaderboard = predictor.leaderboard()
379
print("Trained models:")
380
print(leaderboard[['model', 'score_val']].head(10))
381
```
382
383
### Model Selection and Filtering
384
385
```python
386
from autogluon.tabular import TabularPredictor
387
388
# Include only specific model types
389
predictor = TabularPredictor(label='target')
390
predictor.fit(
391
train_data,
392
included_model_types=['LGB', 'XGB', 'CAT'], # Only gradient boosting
393
time_limit=600
394
)
395
396
# Exclude interpretable models for best performance
397
predictor_performance = TabularPredictor(label='target')
398
predictor_performance.fit(
399
train_data,
400
excluded_model_types=['LR', 'KNN'], # Exclude simpler models
401
presets='best_quality'
402
)
403
404
# Include only interpretable models
405
predictor_interpretable = TabularPredictor(label='target')
406
predictor_interpretable.fit(
407
train_data,
408
included_model_types=['LR', 'RF', 'XGB'], # More interpretable options
409
presets='interpretable'
410
)
411
```
412
413
### Advanced Model Configuration
414
415
```python
416
from autogluon.tabular import TabularPredictor
417
418
# Advanced configuration with model-specific arguments
419
ag_args_fit = {
420
'num_cpus': 8, # CPU cores for training
421
'num_gpus': 1, # GPU devices
422
'memory_limit': 16000, # Memory limit in MB
423
}
424
425
ag_args_ensemble = {
426
'fold_fitting_strategy': 'sequential_local',
427
'auto_stack': True,
428
'bagging_mode': 'oob', # Out-of-bag validation
429
}
430
431
predictor = TabularPredictor(
432
label='target',
433
eval_metric='roc_auc'
434
)
435
436
predictor.fit(
437
train_data,
438
time_limit=3600, # 1 hour
439
presets='best_quality',
440
num_bag_folds=10,
441
num_stack_levels=3,
442
ag_args_fit=ag_args_fit,
443
ag_args_ensemble=ag_args_ensemble,
444
445
# Model-specific advanced arguments
446
hyperparameters={
447
'LGB': {'ag_args': {'name_suffix': '_Large', 'priority': 1}},
448
'XGB': {'ag_args': {'name_suffix': '_XL', 'priority': 2}},
449
'CAT': {'ag_args': {'name_suffix': '_Balanced', 'priority': 3}}
450
}
451
)
452
453
# Analyze model performance and resource usage
454
leaderboard = predictor.leaderboard(extra_info=True)
455
print(leaderboard[['model', 'score_val', 'fit_time', 'pred_time_val']].head())
456
```
457
458
### Working with Model Registry
459
460
```python
461
from autogluon.tabular.registry import ag_model_registry
462
463
# List all available models
464
all_models = ag_model_registry.list_models()
465
print(f"Available models: {len(all_models)}")
466
print(all_models[:10]) # First 10 models
467
468
# Get specific model class
469
lgb_class = ag_model_registry.get_model('LGBModel')
470
print(f"LightGBM model class: {lgb_class}")
471
472
# Check if model is registered
473
if 'XGBModel' in all_models:
474
xgb_class = ag_model_registry.get_model('XGBModel')
475
print(f"XGBoost available: {xgb_class is not None}")
476
477
# Custom model usage
478
from autogluon.tabular.models import RFModel
479
480
# Instantiate model directly (advanced usage)
481
rf_model = RFModel(
482
problem_type='binary',
483
objective='binary_logloss'
484
)
485
486
# This would typically be done within TabularPredictor
487
# rf_model.fit(X_train, y_train)
488
# predictions = rf_model.predict(X_test)
489
```