Tessl Tile for pypi/autogluon@1.4.0

or run

npx @tessl/cli init

Version

Tile

Overview

Evals

Files

docs

core.md features.md index.md multimodal.md tabular.md timeseries.md

tabular.mddocs/

0
# Tabular Machine Learning
1

2
Automated machine learning for structured/tabular data supporting binary classification, multiclass classification, and regression tasks. TabularPredictor automatically handles feature engineering, model selection, hyperparameter tuning, and intelligent ensembling to achieve strong predictive performance with minimal configuration.
3

4
## Capabilities
5

6
### TabularPredictor Class
7

8
Main predictor class for tabular/structured data that automates the entire ML pipeline from data preprocessing to model deployment.
9

10
```python { .api }
11
class TabularPredictor:
12
    def __init__(
13
        self,
14
        label: str,
15
        problem_type: str = None,
16
        eval_metric: str = None,
17
        path: str = None,
18
        verbosity: int = 2,
19
        sample_weight: str = None,
20
        weight_evaluation: bool = False,
21
        groups: str = None,
22
        **kwargs
23
    ):
24
        """
25
        Initialize TabularPredictor for automated machine learning on tabular data.
26
        
27
        Parameters:
28
        - label: Name of the target column to predict
29
        - problem_type: Type of problem ('binary', 'multiclass', 'regression', 'quantile')
30
        - eval_metric: Evaluation metric ('accuracy', 'roc_auc', 'rmse', etc.)
31
        - path: Directory to save models and artifacts
32
        - verbosity: Logging verbosity level (0-4)
33
        - sample_weight: Column name for sample weights
34
        - weight_evaluation: Whether to weight evaluation metrics
35
        - groups: Column name for group information (for grouped CV)
36
        """
37
```
38

39
### Model Training
40

41
Train and automatically tune machine learning models on tabular data with intelligent preprocessing and model selection.
42

43
```python { .api }
44
def fit(
45
    self,
46
    train_data,
47
    tuning_data=None,
48
    time_limit: float = None,
49
    presets: str = None,
50
    hyperparameters=None,
51
    feature_metadata=None,
52
    infer_limit: float = None,
53
    infer_limit_batch_size: int = None,
54
    fit_weighted_ensemble: bool = True,
55
    dynamic_stacking: bool = False,
56
    calibrate_decision_threshold: str = "auto",
57
    num_cpus: str = "auto",
58
    num_gpus: str = "auto",
59
    fit_strategy: str = "sequential",
60
    memory_limit: str = "auto",
61
    excluded_model_types: list = None,
62
    included_model_types: list = None,
63
    holdout_frac: float = None,
64
    callbacks: list = None,
65
    **kwargs
66
):
67
    """
68
    Fit TabularPredictor on training data.
69
    
70
    Parameters:
71
    - train_data: Training data (DataFrame, file path, or TabularDataset)
72
    - tuning_data: Validation data for hyperparameter tuning
73
    - time_limit: Maximum training time in seconds
74
    - presets: Quality/speed presets ('best_quality', 'high_quality', 'medium_quality', 'optimize_for_deployment')
75
    - hyperparameters: Custom hyperparameter configurations
76
    - feature_metadata: Manual feature type specifications or 'infer'
77
    - infer_limit: Time limit for feature inference
78
    - infer_limit_batch_size: Batch size for feature inference
79
    - fit_weighted_ensemble: Whether to fit weighted ensemble models
80
    - dynamic_stacking: Enable dynamic stacking for ensemble models
81
    - calibrate_decision_threshold: Auto-calibrate decision threshold ('auto', True, False)
82
    - num_cpus: Number of CPU cores ('auto' or int)
83
    - num_gpus: Number of GPUs ('auto' or int) 
84
    - fit_strategy: Model fitting strategy ('sequential', 'parallel')
85
    - memory_limit: Memory limit for training ('auto' or float)
86
    - excluded_model_types: List of model types to exclude
87
    - included_model_types: List of model types to include only
88
    - holdout_frac: Fraction of data to hold out for validation
89
    - callbacks: List of callback functions for training
90
    
91
    Returns:
92
    TabularPredictor: Fitted predictor instance
93
    """
94
```
95

96
### Prediction
97

98
Generate predictions and prediction probabilities for new data using the trained model ensemble.
99

100
```python { .api }
101
def predict(
102
    self,
103
    data,
104
    model: str = None,
105
    as_pandas: bool = True,
106
    transform_features: bool = True
107
):
108
    """
109
    Generate predictions for new data.
110
    
111
    Parameters:
112
    - data: Input data (DataFrame, file path, or TabularDataset)
113
    - model: Specific model name to use for prediction
114
    - as_pandas: Return results as pandas Series
115
    - transform_features: Apply feature transformations
116
    
117
    Returns:
118
    Predictions as pandas Series or numpy array
119
    """
120

121
def predict_proba(
122
    self,
123
    data,
124
    model: str = None,
125
    as_pandas: bool = True,
126
    as_multiclass: bool = True,
127
    transform_features: bool = True
128
):
129
    """
130
    Generate prediction probabilities for classification tasks.
131
    
132
    Parameters:
133
    - data: Input data (DataFrame, file path, or TabularDataset)
134
    - model: Specific model name to use for prediction
135
    - as_pandas: Return results as pandas DataFrame
136
    - as_multiclass: Return all class probabilities vs just positive class
137
    - transform_features: Apply feature transformations
138
    
139
    Returns:
140
    Prediction probabilities as pandas DataFrame or numpy array
141
    """
142
```
143

144
### Model Evaluation
145

146
Evaluate model performance and analyze results with comprehensive metrics and model comparison capabilities.
147

148
```python { .api }
149
def evaluate(
150
    self,
151
    data,
152
    model: str = None,
153
    auxiliary_metrics: bool = True,
154
    detailed_report: bool = False,
155
    silent: bool = False
156
):
157
    """
158
    Evaluate predictor performance on test data.
159
    
160
    Parameters:
161
    - data: Test data (DataFrame, file path, or TabularDataset)
162
    - model: Specific model to evaluate
163
    - auxiliary_metrics: Include additional evaluation metrics
164
    - detailed_report: Generate detailed evaluation report
165
    - silent: Suppress output
166
    
167
    Returns:
168
    dict: Dictionary of evaluation metrics
169
    """
170

171
def leaderboard(
172
    self,
173
    data=None,
174
    extra_info: bool = False,
175
    only_pareto_frontier: bool = False,
176
    skip_score: bool = False,
177
    silent: bool = False
178
):
179
    """
180
    Display model leaderboard with performance rankings.
181
    
182
    Parameters:
183
    - data: Test data for evaluation (optional)
184
    - extra_info: Include additional model information
185
    - only_pareto_frontier: Show only Pareto optimal models
186
    - skip_score: Skip performance scoring
187
    - silent: Suppress output
188
    
189
    Returns:
190
    DataFrame: Model leaderboard with performance metrics
191
    """
192
```
193

194
### Feature Analysis
195

196
Analyze feature importance and understand model behavior through interpretability tools.
197

198
```python { .api }
199
def feature_importance(
200
    self,
201
    data=None,
202
    model: str = None,
203
    features: list = None,
204
    feature_stage: str = 'original',
205
    subsample_size: int = 5000,
206
    silent: bool = False
207
):
208
    """
209
    Calculate feature importance scores.
210
    
211
    Parameters:
212
    - data: Data for importance calculation
213
    - model: Specific model to analyze
214
    - features: Specific features to analyze
215
    - feature_stage: Feature processing stage ('original' or 'transformed')
216
    - subsample_size: Sample size for efficient computation
217
    - silent: Suppress output
218
    
219
    Returns:
220
    DataFrame: Feature importance scores
221
    """
222

223
def fit_summary(self, verbosity: int = 1, show_plot: bool = False):
224
    """
225
    Display summary of training process and results.
226
    
227
    Parameters:
228
    - verbosity: Detail level (0-4)
229
    - show_plot: Show training plots
230
    
231
    Returns:
232
    dict: Training summary information
233
    """
234
```
235

236
### Model Persistence
237

238
Save and load trained predictors for deployment and reuse.
239

240
```python { .api }
241
def save(self, path: str = None):
242
    """
243
    Save trained predictor to disk.
244
    
245
    Parameters:
246
    - path: Directory to save predictor
247
    """
248

249
@classmethod
250
def load(cls, path: str, verbosity: int = 2):
251
    """
252
    Load saved predictor from disk.
253
    
254
    Parameters:
255
    - path: Directory containing saved predictor
256
    - verbosity: Logging verbosity level
257
    
258
    Returns:
259
    TabularPredictor: Loaded predictor instance
260
    """
261
```
262

263
### Advanced Features
264

265
Advanced model configuration and specialized functionality for power users.
266

267
```python { .api }
268
def refit_full(self, model: str = 'best'):
269
    """
270
    Refit model on full dataset (train + validation).
271
    
272
    Parameters:
273
    - model: Model to refit ('best', 'all', or specific model name)
274
    
275
    Returns:
276
    dict: Refit results
277
    """
278

279
def distill(
280
    self,
281
    train_data=None,
282
    tuning_data=None,
283
    time_limit: int = None,
284
    hyperparameters=None,
285
    **kwargs
286
):
287
    """
288
    Create distilled (compressed) version of ensemble model.
289
    
290
    Parameters:
291
    - train_data: Training data for distillation
292
    - tuning_data: Validation data for distillation
293
    - time_limit: Maximum distillation time
294
    - hyperparameters: Distillation hyperparameters
295
    
296
    Returns:
297
    dict: Distillation results
298
    """
299

300
def persist_models(self, models: list = None, with_ancestors: bool = True):
301
    """
302
    Persist models in memory to disk for memory optimization.
303
    
304
    Parameters:
305
    - models: List of model names to persist
306
    - with_ancestors: Include ancestor models in persistence
307
    """
308

309
def unpersist_models(self, models: list = None):
310
    """
311
    Load persisted models back into memory.
312
    
313
    Parameters:
314
    - models: List of model names to unpersist
315
    """
316

317
def calibrate_decision_threshold(
318
    self,
319
    data=None,
320
    metric: str = None,
321
    return_optimization_curve: bool = False,
322
    verbose: bool = True
323
):
324
    """
325
    Calibrate decision threshold for binary classification to optimize specified metric.
326
    
327
    Parameters:
328
    - data: Data to use for threshold calibration
329
    - metric: Metric to optimize ('f1', 'balanced_accuracy', 'mcc', etc.)
330
    - return_optimization_curve: Return threshold vs metric curve
331
    - verbose: Print optimization results
332
    
333
    Returns:
334
    dict or tuple: Calibration results, optionally with optimization curve
335
    """
336

337
def clone(self, path: str, *, return_clone: bool = False, dirs_exist_ok: bool = False):
338
    """
339
    Create a copy of the predictor at a new location.
340
    
341
    Parameters:
342
    - path: Directory path for the cloned predictor
343
    - return_clone: Return the cloned predictor instance
344
    - dirs_exist_ok: Allow overwriting existing directory
345
    
346
    Returns:
347
    str or TabularPredictor: Path to clone or cloned predictor instance
348
    """
349

350
def clone_for_deployment(
351
    self, 
352
    path: str, 
353
    *, 
354
    model: str = "best", 
355
    return_clone: bool = False, 
356
    dirs_exist_ok: bool = False
357
):
358
    """
359
    Create optimized copy of predictor for deployment with minimal storage footprint.
360
    
361
    Parameters:
362
    - path: Directory path for deployment clone
363
    - model: Model to include in deployment clone
364
    - return_clone: Return the cloned predictor instance
365
    - dirs_exist_ok: Allow overwriting existing directory
366
    
367
    Returns:
368
    str or TabularPredictor: Path to clone or cloned predictor instance
369
    """
370
```
371

372
### InterpretableTabularPredictor Class
373

374
**[EXPERIMENTAL]** Specialized TabularPredictor subclass focused on interpretable models with simple, human-readable rules. Trades accuracy for interpretability by limiting to simple models and disabling complex ensemble techniques.
375

376
```python { .api }
377
class InterpretableTabularPredictor(TabularPredictor):
378
    def __init__(self, *args, **kwargs):
379
        """
380
        Initialize InterpretableTabularPredictor with same parameters as TabularPredictor.
381
        Automatically restricts to interpretable models and preprocessing.
382
        """
383
    
384
    def fit(
385
        self,
386
        train_data,
387
        tuning_data=None,
388
        time_limit: float = None,
389
        *,
390
        presets: str = "interpretable",
391
        **kwargs
392
    ):
393
        """
394
        Fit interpretable models with automatic preset selection for interpretability.
395
        
396
        Parameters:
397
        - train_data: Training data (same as TabularPredictor)
398
        - tuning_data: Validation data (optional)
399
        - time_limit: Maximum training time
400
        - presets: Defaults to "interpretable" preset
401
        
402
        Note: Bagging, stacking, and complex ensembles are disabled for interpretability
403
        """
404
    
405
    def leaderboard_interpretable(self, verbose: bool = False, **kwargs):
406
        """
407
        Leaderboard with model complexity scores for interpretable model selection.
408
        
409
        Parameters:
410
        - verbose: Print detailed leaderboard
411
        
412
        Returns:
413
        DataFrame: Leaderboard with additional 'complexity' column showing rule count
414
        """
415
    
416
    def print_interpretable_rules(
417
        self, 
418
        complexity_threshold: int = 10, 
419
        model_name: str = None
420
    ):
421
        """
422
        Print human-readable rules from the best interpretable model.
423
        
424
        Parameters:
425
        - complexity_threshold: Maximum rule complexity to display
426
        - model_name: Specific model to show rules for
427
        """
428
```
429

430
## Usage Examples
431

432
### Basic Classification
433

434
```python
435
from autogluon.tabular import TabularPredictor
436

437
# Binary classification
438
predictor = TabularPredictor(label='target')
439
predictor.fit('train.csv', presets='best_quality', time_limit=3600)
440

441
# Make predictions
442
predictions = predictor.predict('test.csv')
443
probabilities = predictor.predict_proba('test.csv')
444

445
# Evaluate performance
446
scores = predictor.evaluate('test.csv')
447
print(f"Accuracy: {scores['accuracy']:.3f}")
448

449
# View model leaderboard
450
leaderboard = predictor.leaderboard('test.csv')
451
print(leaderboard)
452
```
453

454
### Custom Configuration
455

456
```python
457
# Custom hyperparameters and model selection
458
hyperparameters = {
459
    'GBM': {'num_boost_round': 1000, 'learning_rate': 0.01},
460
    'RF': {'n_estimators': 500, 'max_depth': 20},
461
    'XGB': {'n_estimators': 1000, 'learning_rate': 0.01}
462
}
463

464
predictor = TabularPredictor(
465
    label='price',
466
    problem_type='regression',
467
    eval_metric='rmse',
468
    path='./models'
469
)
470

471
predictor.fit(
472
    train_data,
473
    hyperparameters=hyperparameters,
474
    excluded_model_types=['KNN', 'LR'],  # Exclude certain model types
475
    time_limit=7200,
476
    presets='high_quality'
477
)
478

479
# Feature importance analysis
480
importance = predictor.feature_importance(train_data)
481
print(importance.head(10))
482
```

Version

Tile

Files

tabular.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}docs/

tabular.mddocs/