Tessl Tile for pypi/autogluon.tabular@1.4.0

or run

npx @tessl/cli init

Version

Tile

Overview

Evals

Files

docs

configurations.md experimental.md index.md models.md predictor.md

models.mddocs/

0
# Models and Registry
1

2
AutoGluon Tabular provides a comprehensive collection of machine learning models with unified interfaces, spanning from traditional algorithms to modern deep learning approaches. The model registry system enables extensibility and customization of the available model portfolio.
3

4
## Capabilities
5

6
### Core Machine Learning Models
7

8
Traditional and gradient boosting models that form the backbone of AutoGluon's automated machine learning capabilities, providing robust performance across diverse tabular datasets.
9

10
```python { .api }
11
# Gradient Boosting Models
12
class LGBModel:
13
    """LightGBM gradient boosting model optimized for speed and memory efficiency."""
14
    
15
class XGBoostModel:
16
    """XGBoost gradient boosting model with advanced regularization and handling of missing values."""
17
    
18
class CatBoostModel:
19
    """CatBoost gradient boosting model with native categorical feature support."""
20

21
# Tree-based Models
22
class RFModel:
23
    """Random Forest model providing ensemble of decision trees with feature bagging."""
24
    
25
class XTModel:
26
    """Extra Trees (Extremely Randomized Trees) model with increased randomization."""
27

28
# Linear Models
29
class LinearModel:
30
    """Linear/Logistic Regression with automatic regularization and feature scaling."""
31

32
# Instance-based Models
33
class KNNModel:
34
    """K-Nearest Neighbors model for both classification and regression tasks."""
35
```
36

37
### Neural Network Models
38

39
Deep learning models optimized for tabular data with automatic architecture selection, hyperparameter optimization, and specialized architectures for structured data.
40

41
```python { .api }
42
# Traditional Neural Networks
43
class NNFastAiTabularModel:
44
    """FastAI-based neural network with automated preprocessing and training."""
45
    
46
class TabularNeuralNetTorchModel:
47
    """PyTorch-based neural network with custom architecture for tabular data."""
48

49
# Transformer-based Models
50
class FTTransformerModel:
51
    """Feature Tokenizer Transformer - specialized transformer architecture for tabular data."""
52

53
# Pre-trained Foundation Models
54
class TabPFNV2Model:
55
    """TabPFN v2 - pre-trained transformer model fine-tuned for tabular prediction."""
56
    
57
class TabPFNMixModel:
58
    """TabPFN Mix - ensemble of pre-trained transformers for improved performance."""
59
    
60
class MitraModel:
61
    """Mitra - advanced transformer architecture optimized for tabular classification."""
62

63
# Specialized Neural Networks
64
class TabMModel:
65
    """TabM - neural network with attention mechanisms for tabular data."""
66
    
67
class RealMLPModel:
68
    """Real-valued MLP with specialized training procedures for tabular prediction."""
69
    
70
class TabICLModel:
71
    """TabICL - in-context learning model for few-shot tabular prediction."""
72
```
73

74
### Multi-Modal Models
75

76
Models capable of handling mixed data types including text, images, and structured features within the same prediction task.
77

78
```python { .api }
79
class MultiModalPredictorModel:
80
    """
81
    AutoMM-based multi-modal model handling tabular, text, and image features.
82
    Automatically detects and processes different data modalities.
83
    """
84

85
class TextPredictorModel:
86
    """Specialized model for tabular data containing text features."""
87

88
class FastTextModel:
89
    """FastText model for efficient text classification and representation learning."""
90
    
91
class ImagePredictorModel:
92
    """Model for tabular data with image features or image-based prediction."""
93
```
94

95
### Interpretable Models
96

97
Models designed for interpretability and explainability, providing transparent decision-making processes suitable for regulated industries and high-stakes applications.
98

99
```python { .api }
100
# Base Interpretable Model
101
class _IModelsModel:
102
    """Base class for interpretable machine learning models."""
103

104
# Rule-based Models
105
class BoostedRulesModel:
106
    """Gradient-boosted rule ensemble providing interpretable decision rules."""
107
    
108
class RuleFitModel:
109
    """RuleFit model combining linear regression with decision rules."""
110
    
111
class FigsModel:
112
    """FIGS (Fast Interpretable Greedy-tree Sums) model for rule-based predictions."""
113

114
# Tree-based Interpretable Models
115
class GreedyTreeModel:
116
    """Greedy decision tree optimized for interpretability over accuracy."""
117
    
118
class HSTreeModel:
119
    """Hierarchical Shrinkage Tree with built-in regularization."""
120

121
# Text Models
122
class FastTextModel:
123
    """FastText model for text classification in tabular datasets."""
124
```
125

126
### Model Registry System
127

128
Extensible registry system for managing, registering, and accessing machine learning models within AutoGluon's framework.
129

130
```python { .api }
131
class ModelRegistry:
132
    """
133
    Registry for managing available machine learning models.
134
    Enables custom model registration and retrieval.
135
    """
136
    
137
    def __init__(self):
138
        """Initialize empty model registry."""
139
    
140
    def register_model(
141
        self,
142
        name: str,
143
        model_class: type,
144
        tags: list[str] = None
145
    ) -> None:
146
        """
147
        Register a new model class in the registry.
148
        
149
        Parameters:
150
        - name: Unique identifier for the model
151
        - model_class: Model class to register
152
        - tags: Optional tags for categorization
153
        """
154
    
155
    def get_model(self, name: str) -> type:
156
        """
157
        Retrieve a registered model class by name.
158
        
159
        Parameters:
160
        - name: Name of the registered model
161
        
162
        Returns:
163
        Model class
164
        """
165
    
166
    def list_models(self, tags: list[str] = None) -> list[str]:
167
        """
168
        List all registered model names.
169
        
170
        Parameters:
171
        - tags: Filter by tags (optional)
172
        
173
        Returns:
174
        List of model names
175
        """
176
    
177
    def unregister_model(self, name: str) -> None:
178
        """
179
        Remove a model from the registry.
180
        
181
        Parameters:
182
        - name: Name of the model to remove
183
        """
184

185
# Global model registry instance
186
ag_model_registry: ModelRegistry
187
```
188

189
### Base Model Interface
190

191
Abstract base class defining the common interface that all AutoGluon models must implement for consistent behavior and integration.
192

193
```python { .api }
194
class AbstractModel:
195
    """
196
    Abstract base class for all AutoGluon tabular models.
197
    Defines the standard interface and common functionality.
198
    """
199
    
200
    def __init__(
201
        self,
202
        problem_type: str,
203
        objective: str = None,
204
        **kwargs
205
    ):
206
        """
207
        Initialize model with problem configuration.
208
        
209
        Parameters:
210
        - problem_type: Type of ML problem ('binary', 'multiclass', 'regression')
211
        - objective: Optimization objective/metric
212
        - kwargs: Model-specific parameters
213
        """
214
    
215
    def fit(
216
        self,
217
        X_train: pd.DataFrame,
218
        y_train: pd.Series,
219
        X_val: pd.DataFrame = None,
220
        y_val: pd.Series = None,
221
        **kwargs
222
    ) -> None:
223
        """
224
        Train the model on provided data.
225
        
226
        Parameters:
227
        - X_train: Training features
228
        - y_train: Training labels
229
        - X_val: Validation features (optional)
230
        - y_val: Validation labels (optional)
231
        """
232
    
233
    def predict(self, X: pd.DataFrame, **kwargs) -> np.ndarray:
234
        """
235
        Generate predictions for input data.
236
        
237
        Parameters:
238
        - X: Input features
239
        
240
        Returns:
241
        Predictions as numpy array
242
        """
243
    
244
    def predict_proba(self, X: pd.DataFrame, **kwargs) -> np.ndarray:
245
        """
246
        Generate prediction probabilities (classification only).
247
        
248
        Parameters:
249
        - X: Input features
250
        
251
        Returns:
252
        Prediction probabilities as numpy array
253
        """
254
    
255
    def get_memory_size(self) -> int:
256
        """
257
        Get approximate memory usage of the model in bytes.
258
        
259
        Returns:
260
        Memory usage in bytes
261
        """
262
    
263
    def save(self, path: str) -> None:
264
        """
265
        Save model to disk.
266
        
267
        Parameters:
268
        - path: File path for saving
269
        """
270
    
271
    def load(self, path: str) -> None:
272
        """
273
        Load model from disk.
274
        
275
        Parameters:
276
        - path: File path for loading
277
        """
278
```
279

280
## Usage Examples
281

282
### Custom Model Registration
283

284
```python
285
from autogluon.tabular.models import AbstractModel
286
from autogluon.tabular.registry import ag_model_registry
287
import pandas as pd
288
import numpy as np
289
from sklearn.ensemble import GradientBoostingClassifier
290

291
class CustomGBModel(AbstractModel):
292
    """Custom Gradient Boosting model wrapper."""
293
    
294
    def __init__(self, **kwargs):
295
        super().__init__(**kwargs)
296
        self.model = GradientBoostingClassifier(
297
            n_estimators=kwargs.get('n_estimators', 100),
298
            learning_rate=kwargs.get('learning_rate', 0.1),
299
            random_state=42
300
        )
301
    
302
    def fit(self, X_train, y_train, **kwargs):
303
        self.model.fit(X_train, y_train)
304
    
305
    def predict(self, X):
306
        return self.model.predict(X)
307
    
308
    def predict_proba(self, X):
309
        return self.model.predict_proba(X)
310

311
# Register custom model
312
ag_model_registry.register_model(
313
    name='CustomGB',
314
    model_class=CustomGBModel,
315
    tags=['tree', 'gradient_boosting', 'custom']
316
)
317

318
# Use in TabularPredictor
319
from autogluon.tabular import TabularPredictor
320

321
predictor = TabularPredictor(label='target')
322
predictor.fit(
323
    train_data,
324
    hyperparameters={'CustomGB': {'n_estimators': [50, 100, 200]}}
325
)
326
```
327

328
### Model-Specific Hyperparameter Tuning
329

330
```python
331
from autogluon.tabular import TabularPredictor
332

333
# Define model-specific hyperparameters
334
hyperparameters = {
335
    # LightGBM configurations
336
    'LGB': {
337
        'num_leaves': [31, 127, 255],
338
        'learning_rate': [0.01, 0.05, 0.1],
339
        'feature_fraction': [0.8, 0.9, 1.0],
340
        'bagging_fraction': [0.8, 0.9, 1.0],
341
        'min_data_in_leaf': [10, 20, 50]
342
    },
343
    
344
    # XGBoost configurations
345
    'XGB': {
346
        'n_estimators': [100, 300, 500],
347
        'max_depth': [3, 6, 10],
348
        'learning_rate': [0.01, 0.1, 0.2],
349
        'subsample': [0.8, 0.9, 1.0],
350
        'colsample_bytree': [0.8, 0.9, 1.0]
351
    },
352
    
353
    # CatBoost configurations
354
    'CAT': {
355
        'iterations': [100, 500, 1000],
356
        'depth': [4, 6, 8],
357
        'learning_rate': [0.01, 0.1, 0.2],
358
        'l2_leaf_reg': [1, 3, 5, 7, 9]
359
    },
360
    
361
    # Neural Network configurations
362
    'NN_TORCH': {
363
        'num_epochs': [10, 50, 100],
364
        'learning_rate': [1e-4, 1e-3, 1e-2],
365
        'weight_decay': [1e-6, 1e-4, 1e-2],
366
        'dropout_prob': [0.0, 0.1, 0.2, 0.5]
367
    }
368
}
369

370
predictor = TabularPredictor(label='target')
371
predictor.fit(
372
    train_data,
373
    hyperparameters=hyperparameters,
374
    time_limit=1800  # 30 minutes
375
)
376

377
# Check which models were trained
378
leaderboard = predictor.leaderboard()
379
print("Trained models:")
380
print(leaderboard[['model', 'score_val']].head(10))
381
```
382

383
### Model Selection and Filtering
384

385
```python
386
from autogluon.tabular import TabularPredictor
387

388
# Include only specific model types
389
predictor = TabularPredictor(label='target')
390
predictor.fit(
391
    train_data,
392
    included_model_types=['LGB', 'XGB', 'CAT'],  # Only gradient boosting
393
    time_limit=600
394
)
395

396
# Exclude interpretable models for best performance
397
predictor_performance = TabularPredictor(label='target')
398
predictor_performance.fit(
399
    train_data,
400
    excluded_model_types=['LR', 'KNN'],  # Exclude simpler models
401
    presets='best_quality'
402
)
403

404
# Include only interpretable models
405
predictor_interpretable = TabularPredictor(label='target')
406
predictor_interpretable.fit(
407
    train_data,
408
    included_model_types=['LR', 'RF', 'XGB'],  # More interpretable options
409
    presets='interpretable'
410
)
411
```
412

413
### Advanced Model Configuration
414

415
```python
416
from autogluon.tabular import TabularPredictor
417

418
# Advanced configuration with model-specific arguments
419
ag_args_fit = {
420
    'num_cpus': 8,           # CPU cores for training
421
    'num_gpus': 1,           # GPU devices  
422
    'memory_limit': 16000,   # Memory limit in MB
423
}
424

425
ag_args_ensemble = {
426
    'fold_fitting_strategy': 'sequential_local',
427
    'auto_stack': True,
428
    'bagging_mode': 'oob',   # Out-of-bag validation
429
}
430

431
predictor = TabularPredictor(
432
    label='target',
433
    eval_metric='roc_auc'
434
)
435

436
predictor.fit(
437
    train_data,
438
    time_limit=3600,  # 1 hour
439
    presets='best_quality',
440
    num_bag_folds=10,
441
    num_stack_levels=3,
442
    ag_args_fit=ag_args_fit,
443
    ag_args_ensemble=ag_args_ensemble,
444
    
445
    # Model-specific advanced arguments
446
    hyperparameters={
447
        'LGB': {'ag_args': {'name_suffix': '_Large', 'priority': 1}},
448
        'XGB': {'ag_args': {'name_suffix': '_XL', 'priority': 2}},
449
        'CAT': {'ag_args': {'name_suffix': '_Balanced', 'priority': 3}}
450
    }
451
)
452

453
# Analyze model performance and resource usage
454
leaderboard = predictor.leaderboard(extra_info=True)
455
print(leaderboard[['model', 'score_val', 'fit_time', 'pred_time_val']].head())
456
```
457

458
### Working with Model Registry
459

460
```python
461
from autogluon.tabular.registry import ag_model_registry
462

463
# List all available models
464
all_models = ag_model_registry.list_models()
465
print(f"Available models: {len(all_models)}")
466
print(all_models[:10])  # First 10 models
467

468
# Get specific model class
469
lgb_class = ag_model_registry.get_model('LGBModel')
470
print(f"LightGBM model class: {lgb_class}")
471

472
# Check if model is registered
473
if 'XGBModel' in all_models:
474
    xgb_class = ag_model_registry.get_model('XGBModel')
475
    print(f"XGBoost available: {xgb_class is not None}")
476

477
# Custom model usage
478
from autogluon.tabular.models import RFModel
479

480
# Instantiate model directly (advanced usage)
481
rf_model = RFModel(
482
    problem_type='binary',
483
    objective='binary_logloss'
484
)
485

486
# This would typically be done within TabularPredictor
487
# rf_model.fit(X_train, y_train)
488
# predictions = rf_model.predict(X_test)
489
```

Version

Tile

Files

models.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}docs/

models.mddocs/