Tessl Tile for pypi/autogluon.tabular@1.4.0

or run

npx @tessl/cli init

Version

Tile

Overview

Evals

Files

docs

configurations.md experimental.md index.md models.md predictor.md

configurations.mddocs/

0
# Configuration and Presets
1

2
AutoGluon Tabular provides extensive configuration options through presets, hyperparameter configurations, and feature processing settings. These configurations enable users to optimize for different objectives like accuracy, speed, interpretability, or deployment constraints.
3

4
## Capabilities
5

6
### Preset Configurations
7

8
Pre-configured settings optimized for different use cases, balancing accuracy, training time, and computational resources.
9

10
```python { .api }
11
# Available preset configurations
12
PRESET_CONFIGURATIONS = Literal[
13
    "best_quality",              # Maximum accuracy, longer training time
14
    "high_quality",              # High accuracy with fast inference
15
    "good_quality",              # Good accuracy with very fast inference
16
    "medium_quality",            # Medium accuracy, very fast training (default)
17
    "optimize_for_deployment",   # Optimizes for deployment by cleaning up models
18
    "interpretable"              # Interpretable models only
19
]
20

21
def get_preset_config(preset: str) -> dict:
22
    """
23
    Get configuration dictionary for a specific preset.
24
    
25
    Parameters:
26
    - preset: Name of the preset configuration
27
    
28
    Returns:
29
    Dictionary with preset configuration parameters
30
    """
31
```
32

33
### Hyperparameter Configurations
34

35
Systematic hyperparameter configuration system for customizing model training and optimization strategies.
36

37
```python { .api }
38
def get_hyperparameter_config(
39
    preset: str = None,
40
    model_types: list[str] = None,
41
    search_strategy: str = "auto"
42
) -> dict:
43
    """
44
    Generate hyperparameter configuration for specified models and preset.
45
    
46
    Parameters:
47
    - preset: Base preset configuration
48
    - model_types: List of model types to configure
49
    - search_strategy: Hyperparameter search strategy ('grid', 'random', 'bayesian', 'auto')
50
    
51
    Returns:
52
    Dictionary mapping model names to hyperparameter configurations
53
    """
54

55
# Hyperparameter configuration structure
56
HYPERPARAMETER_CONFIG = dict[str, dict[str, Any]]
57
# Example: {'LGB': {'num_leaves': [31, 127], 'learning_rate': [0.01, 0.1]}}
58

59
def get_hyperparameter_config_options() -> list[str]:
60
    """
61
    Get list of available hyperparameter configuration presets.
62
    
63
    Returns:
64
    List of available configuration names
65
    """
66

67
def get_hyperparameter_config(config_name: str) -> dict:
68
    """
69
    Get specific hyperparameter configuration by name.
70
    
71
    Parameters:
72
    - config_name: Name of the hyperparameter configuration preset
73
    
74
    Returns:
75
    Hyperparameter configuration dictionary
76
    """
77
```
78

79
### Feature Generation Configuration
80

81
Automated feature engineering and preprocessing configuration system for handling diverse data types and feature transformations.
82

83
```python { .api }
84
def get_default_feature_generator(
85
    feature_generator: str = "auto",
86
    feature_metadata: 'FeatureMetadata' = None,
87
    init_kwargs: dict = None
88
) -> 'AutoMLPipelineFeatureGenerator':
89
    """
90
    Get default feature generator with specified configuration.
91
    
92
    Parameters:
93
    - feature_generator: Feature generation preset ('auto', 'interpretable')
94
    - feature_metadata: Metadata for feature processing
95
    - init_kwargs: Additional initialization arguments
96
    
97
    Returns:
98
    Configured feature generator instance
99
    """
100

101
class FeatureGenerator:
102
    """Base class for feature generation and preprocessing."""
103
    
104
    def fit_transform(
105
        self,
106
        X: pd.DataFrame,
107
        feature_metadata: 'FeatureMetadata' = None,
108
        **kwargs
109
    ) -> pd.DataFrame:
110
        """
111
        Fit feature generator and transform input data.
112
        
113
        Parameters:
114
        - X: Input dataframe
115
        - feature_metadata: Feature type metadata
116
        
117
        Returns:
118
        Transformed feature dataframe
119
        """
120
    
121
    def transform(self, X: pd.DataFrame) -> pd.DataFrame:
122
        """Transform input data using fitted generator."""
123
```
124

125
### Advanced Training Arguments
126

127
Configuration options for advanced training strategies including bagging, stacking, and resource management.
128

129
```python { .api }
130
class AGArgsFit:
131
    """Arguments for controlling model fitting behavior."""
132
    
133
    num_cpus: int = "auto"                    # CPU cores for training
134
    num_gpus: int = 0                         # GPU devices to use
135
    memory_limit: int = None                  # Memory limit in MB
136
    disk_limit: int = None                    # Disk space limit in MB
137
    time_limit: float = None                  # Time limit per model in seconds
138
    name_suffix: str = ""                     # Suffix for model names
139
    priority: int = 0                         # Training priority
140
    
141
class AGArgsEnsemble:
142
    """Arguments for controlling ensemble behavior."""
143
    
144
    fold_fitting_strategy: str = "sequential_local"  # Fold fitting strategy
145
    auto_stack: bool = True                          # Enable automatic stacking
146
    bagging_mode: str = "oob"                        # Bagging validation mode
147
    stack_mode: str = "infer"                        # Stacking mode
148
    ensemble_size_max: int = 25                      # Maximum ensemble size
149

150
# Training configuration structure
151
TRAINING_CONFIG = {
152
    'num_bag_folds': int,      # Number of bagging folds (default: auto)
153
    'num_bag_sets': int,       # Number of bagging sets (default: auto)  
154
    'num_stack_levels': int,   # Number of stacking levels (default: auto)
155
    'ag_args_fit': dict,       # Advanced fitting arguments
156
    'ag_args_ensemble': dict,  # Advanced ensemble arguments
157
}
158
```
159

160
### Evaluation and Metric Configuration
161

162
Configuration for evaluation metrics, validation strategies, and performance measurement.
163

164
```python { .api }
165
# Classification metrics
166
CLASSIFICATION_METRICS = [
167
    "accuracy", "balanced_accuracy", "log_loss", 
168
    "f1", "f1_macro", "f1_micro", "f1_weighted",
169
    "roc_auc", "roc_auc_ovo", "roc_auc_ovo_macro", "roc_auc_ovo_weighted",
170
    "roc_auc_ovr", "roc_auc_ovr_macro", "roc_auc_ovr_micro", "roc_auc_ovr_weighted",
171
    "average_precision", "precision", "precision_macro", "precision_micro", "precision_weighted",
172
    "recall", "recall_macro", "recall_micro", "recall_weighted", 
173
    "mcc", "pac_score"
174
]
175

176
# Regression metrics  
177
REGRESSION_METRICS = [
178
    "root_mean_squared_error", "mean_squared_error", "mean_absolute_error",
179
    "median_absolute_error", "mean_absolute_percentage_error", 
180
    "r2", "symmetric_mean_absolute_percentage_error"
181
]
182

183
# Quantile regression metrics
184
QUANTILE_METRICS = ["pinball_loss"]
185

186
def get_metric_config(
187
    problem_type: str,
188
    eval_metric: str = None,
189
    greater_is_better: bool = None
190
) -> dict:
191
    """
192
    Get metric configuration for evaluation.
193
    
194
    Parameters:
195
    - problem_type: Type of ML problem
196
    - eval_metric: Primary evaluation metric
197
    - greater_is_better: Whether higher metric values are better
198
    
199
    Returns:
200
    Metric configuration dictionary
201
    """
202
```
203

204
### Resource and Performance Configuration
205

206
Settings for optimizing computational resource usage, memory management, and training performance.
207

208
```python { .api }
209
class ResourceConfig:
210
    """Configuration for computational resources and performance optimization."""
211
    
212
    # CPU and Memory
213
    num_cpus: int = "auto"              # Number of CPU cores
214
    memory_limit_mb: int = None         # Memory limit in megabytes
215
    
216
    # GPU Configuration  
217
    num_gpus: int = 0                   # Number of GPU devices
218
    gpu_memory_limit: int = None        # GPU memory limit
219
    
220
    # Disk and Storage
221
    disk_limit_mb: int = None           # Disk space limit
222
    cache_data: bool = True             # Cache preprocessed data
223
    
224
    # Performance Optimization
225
    enable_multiprocessing: bool = True  # Enable multiprocessing
226
    max_concurrent_models: int = 1       # Maximum concurrent model training
227
    early_stopping_rounds: int = None    # Early stopping configuration
228
    
229
    # Inference Optimization
230
    optimize_for_deployment: bool = False  # Optimize for deployment
231
    model_compression: bool = False        # Enable model compression
232
```
233

234
## Usage Examples
235

236
### Basic Preset Usage
237

238
```python
239
from autogluon.tabular import TabularPredictor
240
import pandas as pd
241

242
# Load data
243
train_data = pd.read_csv('train.csv')
244
test_data = pd.read_csv('test.csv')
245

246
# Different preset configurations
247
presets = ['good_quality', 'best_quality', 'optimize_for_deployment', 'interpretable']
248

249
results = {}
250
for preset in presets:
251
    print(f"\nTraining with preset: {preset}")
252
    
253
    predictor = TabularPredictor(
254
        label='target',
255
        path=f'./models_{preset}/'
256
    )
257
    
258
    predictor.fit(
259
        train_data,
260
        presets=preset,
261
        time_limit=600  # 10 minutes per preset
262
    )
263
    
264
    # Evaluate performance
265
    performance = predictor.evaluate(test_data)
266
    leaderboard = predictor.leaderboard(test_data)
267
    
268
    results[preset] = {
269
        'score': performance,
270
        'best_model': leaderboard.iloc[0]['model'],
271
        'num_models': len(leaderboard)
272
    }
273
    
274
    print(f"Best score: {performance}")
275
    print(f"Best model: {results[preset]['best_model']}")
276
    print(f"Total models trained: {results[preset]['num_models']}")
277

278
# Compare results
279
print("\nPreset Comparison:")
280
for preset, result in results.items():
281
    print(f"{preset}: {result['score']:.4f} ({result['num_models']} models)")
282
```
283

284
### Custom Hyperparameter Configuration
285

286
```python
287
from autogluon.tabular import TabularPredictor
288

289
# Advanced hyperparameter configuration
290
hyperparameters = {
291
    # Gradient Boosting Models
292
    'LGB': [
293
        # Fast configuration
294
        {
295
            'num_leaves': 31,
296
            'learning_rate': 0.1,
297
            'feature_fraction': 0.9,
298
            'bagging_fraction': 0.8,
299
            'bagging_freq': 5,
300
            'min_data_in_leaf': 20,
301
            'objective': 'binary',
302
            'max_depth': -1,
303
            'save_binary': True,
304
            'ag_args': {'name_suffix': '_Fast', 'priority': 1}
305
        },
306
        # Accurate configuration
307
        {
308
            'num_leaves': 127,
309
            'learning_rate': 0.05,
310
            'feature_fraction': 0.8,
311
            'bagging_fraction': 0.9,
312
            'bagging_freq': 5,
313
            'min_data_in_leaf': 10,
314
            'reg_alpha': 0.1,
315
            'reg_lambda': 0.1,
316
            'ag_args': {'name_suffix': '_Accurate', 'priority': 2}
317
        }
318
    ],
319
    
320
    'XGB': {
321
        'n_estimators': [100, 300, 500],
322
        'max_depth': [3, 6, 10],
323
        'learning_rate': [0.01, 0.1, 0.2],
324
        'subsample': [0.8, 0.9, 1.0],
325
        'colsample_bytree': [0.8, 0.9, 1.0],
326
        'reg_alpha': [0, 0.1, 1],
327
        'reg_lambda': [0, 0.1, 1]
328
    },
329
    
330
    # Neural Networks
331
    'NN_TORCH': [
332
        # Small network
333
        {
334
            'num_epochs': 50,
335
            'learning_rate': 0.001,
336
            'weight_decay': 1e-4,
337
            'dropout_prob': 0.1,
338
            'embedding_size_factor': 1.0,
339
            'ag_args': {'name_suffix': '_Small'}
340
        },
341
        # Large network
342
        {
343
            'num_epochs': 100,
344
            'learning_rate': 0.0005,
345
            'weight_decay': 1e-5,
346
            'dropout_prob': 0.2,
347
            'embedding_size_factor': 2.0,
348
            'ag_args': {'name_suffix': '_Large'}
349
        }
350
    ]
351
}
352

353
# Train with custom hyperparameters
354
predictor = TabularPredictor(label='target')
355
predictor.fit(
356
    train_data,
357
    hyperparameters=hyperparameters,
358
    time_limit=1800,  # 30 minutes
359
    num_bag_folds=5,
360
    num_stack_levels=2
361
)
362
```
363

364
### Advanced Training Configuration
365

366
```python
367
from autogluon.tabular import TabularPredictor
368

369
# Advanced training arguments
370
ag_args_fit = {
371
    'num_cpus': 8,                    # Use 8 CPU cores
372
    'num_gpus': 1,                    # Use 1 GPU
373
    'memory_limit': 16000,            # 16GB memory limit
374
    'time_limit': 300,                # 5 minutes per model
375
}
376

377
ag_args_ensemble = {
378
    'fold_fitting_strategy': 'sequential_local',
379
    'auto_stack': True,
380
    'bagging_mode': 'oob',           # Out-of-bag validation
381
    'stack_mode': 'infer',
382
    'ensemble_size_max': 50          # Maximum ensemble size
383
}
384

385
# Feature generation configuration
386
feature_generator_kwargs = {
387
    'enable_raw_text_features': True,
388
    'enable_nlp_features': True,
389
    'text_ngram_size': 300,
390
    'text_special_features': ['word_count', 'char_count']
391
}
392

393
predictor = TabularPredictor(
394
    label='target',
395
    eval_metric='roc_auc',
396
    sample_weight='sample_weights'
397
)
398

399
predictor.fit(
400
    train_data,
401
    tuning_data=validation_data,
402
    time_limit=3600,                  # 1 hour total
403
    presets='best_quality',
404
    
405
    # Advanced configurations
406
    ag_args_fit=ag_args_fit,
407
    ag_args_ensemble=ag_args_ensemble,
408
    feature_generator_kwargs=feature_generator_kwargs,
409
    
410
    # Bagging and stacking
411
    num_bag_folds=10,
412
    num_bag_sets=3,
413
    num_stack_levels=3,
414
    
415
    # Model selection
416
    excluded_model_types=['KNN'],      # Exclude slow models
417
    
418
    # Hyperparameter tuning
419
    hyperparameter_tune_kwargs={
420
        'scheduler': 'local',
421
        'searcher': 'bayesopt',
422
        'num_trials': 100
423
    }
424
)
425
```
426

427
### Deployment Optimization Configuration
428

429
```python
430
from autogluon.tabular import TabularPredictor
431

432
# Configuration optimized for deployment
433
deployment_hyperparameters = {
434
    'LGB': {
435
        'num_leaves': 31,              # Smaller trees
436
        'max_depth': 6,
437
        'min_data_in_leaf': 50,        # Regularization
438
        'bagging_freq': 0,             # Disable bagging for speed
439
        'feature_fraction': 1.0,       # Use all features
440
    },
441
    'CAT': {
442
        'iterations': 100,             # Fewer iterations
443
        'depth': 6,
444
        'l2_leaf_reg': 3,
445
        'bootstrap_type': 'No'         # Disable bootstrap
446
    }
447
}
448

449
predictor = TabularPredictor(
450
    label='target',
451
    path='./deployment_model/'
452
)
453

454
predictor.fit(
455
    train_data,
456
    presets='optimize_for_deployment',
457
    hyperparameters=deployment_hyperparameters,
458
    time_limit=300,                    # Fast training
459
    num_bag_folds=0,                   # Disable bagging
460
    num_stack_levels=0,                # Disable stacking
461
    
462
    # Focus on fast, simple models
463
    included_model_types=['LGB', 'CAT', 'LR']
464
)
465

466
# Create deployment-optimized clone
467
deployment_predictor = predictor.clone_for_deployment(
468
    path='./deployment_ready/',
469
    model='best'                       # Single best model only
470
)
471

472
# Test inference speed
473
import time
474
start_time = time.time()
475
predictions = deployment_predictor.predict(test_data)
476
inference_time = time.time() - start_time
477

478
print(f"Inference time: {inference_time:.3f} seconds")
479
print(f"Predictions per second: {len(test_data) / inference_time:.0f}")
480
```
481

482
### Interpretable Model Configuration
483

484
```python
485
from autogluon.tabular import TabularPredictor
486

487
# Configuration for interpretable models
488
interpretable_hyperparameters = {
489
    'LR': {                            # Logistic Regression
490
        'C': [0.01, 0.1, 1.0, 10],     # Regularization
491
        'penalty': ['l1', 'l2'],
492
        'solver': ['liblinear', 'saga']
493
    },
494
    'RF': {                            # Random Forest
495
        'n_estimators': [50, 100, 200],
496
        'max_depth': [3, 5, 10],       # Limit depth for interpretability
497
        'min_samples_split': [10, 20, 50],
498
        'max_features': ['sqrt', 'log2']
499
    },
500
    'XGB': {                           # XGBoost (regularized)
501
        'n_estimators': [50, 100],
502
        'max_depth': [3, 4, 5],        # Shallow trees
503
        'learning_rate': [0.1, 0.2],
504
        'reg_alpha': [0.1, 1.0],       # L1 regularization
505
        'reg_lambda': [0.1, 1.0]       # L2 regularization
506
    }
507
}
508

509
predictor = TabularPredictor(
510
    label='target',
511
    eval_metric='accuracy'
512
)
513

514
predictor.fit(
515
    train_data,
516
    presets='interpretable',
517
    hyperparameters=interpretable_hyperparameters,
518
    
519
    # Enable only interpretable models
520
    included_model_types=['LR', 'RF', 'XGB'],
521
    
522
    # Simpler ensemble strategies
523
    num_bag_folds=3,
524
    num_stack_levels=1,
525
    
526
    # Feature processing for interpretability
527
    feature_generator='auto'           # Minimal feature engineering
528
)
529

530
# Analyze model interpretability
531
leaderboard = predictor.leaderboard(extra_info=True)
532
print("Interpretable models ranking:")
533
print(leaderboard[['model', 'score_val', 'fit_time']].head())
534
```
535

536
## Configuration Reference
537

538
### Preset Details
539

540
| Preset | Training Time | Model Diversity | Ensembling | Best For |
541
|--------|---------------|-----------------|------------|----------|
542
| `medium_quality` | Low | Medium | None | Quick prototyping, default preset |
543
| `good_quality` | Medium | High | Moderate | General use, balanced performance |
544
| `high_quality` | High | High | Extensive | High accuracy with fast inference |
545
| `best_quality` | Very High | Very High | Extensive | Maximum accuracy, competitions |
546
| `optimize_for_deployment` | - | - | - | Post-training optimization |
547
| `interpretable` | Low | Limited | Simple | Regulated industries, explainability |
548
549
### Model Type Abbreviations
550

551
| Code | Full Name | Category |
552
|------|-----------|----------|
553
| `LGB` | LightGBM | Gradient Boosting |
554
| `XGB` | XGBoost | Gradient Boosting |
555
| `CAT` | CatBoost | Gradient Boosting |
556
| `RF` | Random Forest | Tree Ensemble |
557
| `XT` | Extra Trees | Tree Ensemble |
558
| `LR` | Linear/Logistic Regression | Linear |
559
| `KNN` | K-Nearest Neighbors | Instance-based |
560
| `NN_TORCH` | PyTorch Neural Network | Deep Learning |
561
| `FASTAI` | FastAI Neural Network | Deep Learning |
562
| `TABPFN` | TabPFN | Foundation Model |
563
564
### Resource Configuration Guidelines
565

566
| Use Case | CPU Cores | Memory (GB) | Time Limit | Bag Folds |
567
|----------|-----------|-------------|------------|-----------|
568
| Quick Prototype | 2-4 | 4-8 | 5-15 min | 2-3 |
569
| Production Model | 8-16 | 16-32 | 30-60 min | 5-10 |
570
| Competition | 16-32 | 32-64 | 2-8 hours | 10-20 |
571
| Large Dataset | 16+ | 64+ | 4+ hours | 5-10 |

Version

Tile

Files

configurations.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}docs/

configurations.mddocs/