Tessl Tile for pypi/autogluon.tabular@1.4.0

or run

npx @tessl/cli init

Version

Tile

Overview

Evals

Files

docs

configurations.md experimental.md index.md models.md predictor.md

predictor.mddocs/

0
# Core Prediction Interface
1

2
The TabularPredictor class provides the main interface for automated machine learning on tabular datasets. It handles the complete ML pipeline from data preprocessing to model training, evaluation, and deployment with minimal user configuration required.
3

4
## Capabilities
5

6
### Predictor Initialization
7

8
Creates a new TabularPredictor instance configured for a specific prediction task with automatic problem type detection and evaluation metric selection.
9

10
```python { .api }
11
class TabularPredictor:
12
    def __init__(
13
        self,
14
        label: str,
15
        problem_type: str = None,
16
        eval_metric: str | Scorer = None,
17
        path: str = None,
18
        verbosity: int = 2,
19
        log_to_file: bool = False,
20
        log_file_path: str = "auto",
21
        sample_weight: str = None,
22
        weight_evaluation: bool = False,
23
        groups: str = None,
24
        positive_class: int | str | None = None,
25
        **kwargs
26
    ):
27
        """
28
        Initialize TabularPredictor for automated machine learning.
29
        
30
        Parameters:
31
        - label: Name of the target column to predict
32
        - problem_type: Type of problem ('binary', 'multiclass', 'regression', 'quantile')
33
        - eval_metric: Metric for model evaluation and selection
34
        - path: Directory to save models and outputs
35
        - verbosity: Logging level (0-4)
36
        - log_to_file: Whether to save logs to file
37
        - log_file_path: Path for log file (auto for default)
38
        - sample_weight: Column name for sample weights or 'auto_weight'/'balance_weight'
39
        - weight_evaluation: Whether to use sample weights in evaluation
40
        - groups: Column for custom data splitting in bagging
41
        - positive_class: Positive class for binary classification metrics
42
        """
43
```
44

45
### Model Training
46

47
Trains multiple machine learning models with automatic hyperparameter optimization, ensemble creation, and model selection using advanced techniques like bagging and stacking.
48

49
```python { .api }
50
def fit(
51
    self,
52
    train_data: pd.DataFrame | str,
53
    tuning_data: pd.DataFrame | str = None,
54
    time_limit: float = None,
55
    presets: list[str] | str = None,
56
    hyperparameters: dict | str = None,
57
    feature_metadata: str | FeatureMetadata = "infer",
58
    infer_limit: float = None,
59
    infer_limit_batch_size: int = None,
60
    fit_weighted_ensemble: bool = True,
61
    fit_full_last_level_weighted_ensemble: bool = True,
62
    full_weighted_ensemble_additionally: bool = False,
63
    dynamic_stacking: bool | str = False,
64
    calibrate_decision_threshold: bool | str = "auto",
65
    num_cpus: int | str = "auto",
66
    num_gpus: int | str = "auto",
67
    fit_strategy: Literal["sequential", "parallel"] = "sequential",
68
    memory_limit: float | str = "auto",
69
    callbacks: list[AbstractCallback] = None,
70
    **kwargs
71
) -> 'TabularPredictor':
72
    """
73
    Train machine learning models on the provided dataset.
74
    
75
    Parameters:
76
    - train_data: Training dataset as DataFrame or file path string
77
    - tuning_data: Optional validation dataset as DataFrame or file path string
78
    - time_limit: Maximum training time in seconds (float)
79
    - presets: Pre-configured settings list or single preset ('best_quality', 'high_quality', etc.)
80
    - hyperparameters: Custom hyperparameter configurations as dict or preset string
81
    - feature_metadata: Feature metadata configuration or "infer" for automatic detection
82
    - infer_limit: Time limit for feature inference in seconds
83
    - infer_limit_batch_size: Batch size for feature inference
84
    - fit_weighted_ensemble: Whether to fit weighted ensemble models
85
    - fit_full_last_level_weighted_ensemble: Whether to fit full last level weighted ensemble
86
    - full_weighted_ensemble_additionally: Whether to fit additional full weighted ensemble
87
    - dynamic_stacking: Whether to use dynamic stacking (bool or strategy string)
88
    - calibrate_decision_threshold: Whether to calibrate decision threshold ("auto", True, False)
89
    - num_cpus: Number of CPUs to use ("auto" or integer)
90
    - num_gpus: Number of GPUs to use ("auto" or integer)
91
    - fit_strategy: Strategy for fitting models ("sequential" or "parallel")
92
    - memory_limit: Memory limit ("auto" or float in GB)
93
    - callbacks: List of callback functions for training monitoring
94
    
95
    Returns:
96
    Self (TabularPredictor instance)
97
    """
98
```
99

100
### Predictions
101

102
Generates predictions using trained models with options for single model or ensemble predictions, automatic feature transformation, and flexible output formats.
103

104
```python { .api }
105
def predict(
106
    self,
107
    data: pd.DataFrame | str,
108
    model: str = None,
109
    as_pandas: bool = True,
110
    transform_features: bool = True,
111
    *,
112
    decision_threshold: float = None,
113
    **kwargs
114
) -> pd.Series | np.ndarray:
115
    """
116
    Generate predictions for new data.
117
    
118
    Parameters:
119
    - data: Input data or path to data file
120
    - model: Specific model to use (default: best model)
121
    - as_pandas: Return pandas Series (True) or numpy array (False)
122
    - transform_features: Apply feature preprocessing
123
    - decision_threshold: Decision threshold for binary classification
124
    
125
    Returns:
126
    Predictions as pandas Series or numpy array
127
    """
128

129
def predict_proba(
130
    self,
131
    data: pd.DataFrame | str,
132
    model: str = None,
133
    as_pandas: bool = True,
134
    as_multiclass: bool = True,
135
    transform_features: bool = True,
136
    **kwargs
137
) -> pd.DataFrame | pd.Series | np.ndarray:
138
    """
139
    Generate prediction probabilities for classification tasks.
140
    
141
    Parameters:
142
    - data: Input data or path to data file
143
    - model: Specific model to use (default: best model)
144
    - as_pandas: Return pandas DataFrame (True) or numpy array (False)
145
    - as_multiclass: Return multiclass format for binary classification
146
    - transform_features: Apply feature preprocessing
147
    
148
    Returns:
149
    Prediction probabilities as pandas DataFrame or numpy array
150
    """
151

152
def predict_from_proba(
153
    self,
154
    y_pred_proba: pd.DataFrame | np.ndarray,
155
    decision_threshold: float = None
156
) -> pd.Series | np.ndarray:
157
    """
158
    Convert prediction probabilities to class predictions.
159
    
160
    Parameters:
161
    - y_pred_proba: Prediction probabilities
162
    - decision_threshold: Custom threshold for binary classification
163
    
164
    Returns:
165
    Class predictions
166
    """
167
```
168

169
### Multi-Model Predictions
170

171
Generates predictions from multiple models simultaneously for model comparison, uncertainty estimation, and ensemble analysis.
172

173
```python { .api }
174
def predict_multi(
175
    self,
176
    data: pd.DataFrame = None,
177
    models: list[str] = None,
178
    as_pandas: bool = True,
179
    transform_features: bool = True,
180
    **kwargs
181
) -> pd.DataFrame | dict:
182
    """
183
    Generate predictions from multiple models.
184
    
185
    Parameters:
186
    - data: Input data
187
    - models: List of model names (default: all models)
188
    - as_pandas: Return format
189
    - transform_features: Apply feature preprocessing
190
    
191
    Returns:
192
    Multi-model predictions
193
    """
194

195
def predict_proba_multi(
196
    self,
197
    data: pd.DataFrame = None,  
198
    models: list[str] = None,
199
    as_pandas: bool = True,
200
    as_multiclass: bool = True,
201
    **kwargs
202
) -> dict:
203
    """
204
    Generate prediction probabilities from multiple models.
205
    
206
    Parameters:
207
    - data: Input data
208
    - models: List of model names (default: all models)
209
    - as_pandas: Return format
210
    - as_multiclass: Multiclass format for binary classification
211
    
212
    Returns:
213
    Multi-model prediction probabilities
214
    """
215
```
216

217
### Model Evaluation
218

219
Comprehensive model evaluation with multiple metrics, detailed performance analysis, and comparison across different models and datasets.
220

221
```python { .api }
222
def evaluate(
223
    self,
224
    data: pd.DataFrame | str,
225
    model: str = None,
226
    silent: bool = False,
227
    auxiliary_metrics: bool = True,
228
    detailed_report: bool = False,
229
    **kwargs
230
) -> dict:
231
    """
232
    Evaluate model performance on provided dataset.
233
    
234
    Parameters:
235
    - data: Evaluation data or path to data file
236
    - model: Specific model to evaluate (default: best model)
237
    - silent: Suppress printed output
238
    - auxiliary_metrics: Include additional metrics beyond eval_metric
239
    - detailed_report: Generate detailed evaluation report
240
    
241
    Returns:
242
    Dictionary of evaluation metrics and scores
243
    """
244

245
def evaluate_predictions(
246
    self,
247
    y_true: pd.Series | np.ndarray,
248
    y_pred: pd.Series | np.ndarray,
249
    sample_weight: pd.Series | np.ndarray = None,
250
    decision_threshold: float = None,
251
    display: bool = False,
252
    auxiliary_metrics: bool = True,
253
    detailed_report: bool = False,
254
    **kwargs
255
) -> dict:
256
    """
257
    Evaluate predictions directly without requiring predictor or data.
258
    
259
    Parameters:
260
    - y_true: Ground truth labels
261
    - y_pred: Model predictions
262
    - sample_weight: Sample weights for evaluation
263
    - decision_threshold: Threshold for binary classification
264
    - display: Print evaluation results
265
    - auxiliary_metrics: Include additional metrics
266
    - detailed_report: Generate detailed report
267
    
268
    Returns:
269
    Dictionary of evaluation metrics
270
    """
271

272
def leaderboard(
273
    self,
274
    data: pd.DataFrame | str = None,
275
    extra_info: bool = False,
276
    only_pareto_frontier: bool = False,
277
    skip_score: bool = False,
278
    **kwargs
279
) -> pd.DataFrame:
280
    """
281
    Generate model leaderboard with performance rankings.
282
    
283
    Parameters:
284
    - data: Evaluation data (default: validation data)
285
    - extra_info: Include additional model information
286
    - only_pareto_frontier: Show only Pareto optimal models
287
    - skip_score: Skip scoring models (faster)
288
    
289
    Returns:
290
    DataFrame with model rankings and performance metrics
291
    """
292
```
293

294
### Out-of-Fold Predictions
295

296
Advanced functionality for accessing out-of-fold predictions from cross-validation, useful for stacking, analysis, and debugging model performance.
297

298
```python { .api }
299
def predict_oof(
300
    self,
301
    model: str = None,
302
    transformed: bool = False,
303
    train_data: pd.DataFrame = None,
304
    internal_oof: bool = False,
305
    decision_threshold: float = None,
306
    **kwargs
307
) -> pd.Series:
308
    """
309
    Get out-of-fold predictions for training data.
310
    
311
    Parameters:
312
    - model: Model name (default: best model)
313
    - transformed: Use transformed feature representation
314
    - train_data: Training data (default: original training data)
315
    - internal_oof: Use internal OOF format
316
    - decision_threshold: Threshold for binary classification
317
    
318
    Returns:
319
    Out-of-fold predictions for training data
320
    """
321

322
def predict_proba_oof(
323
    self,
324
    model: str = None,
325
    transformed: bool = False,
326
    as_multiclass: bool = True,
327
    train_data: pd.DataFrame = None,
328
    internal_oof: bool = False,
329
    **kwargs
330
) -> pd.DataFrame | pd.Series:
331
    """
332
    Get out-of-fold prediction probabilities for training data.
333
    
334
    Parameters:
335
    - model: Model name (default: best model)
336
    - transformed: Use transformed feature representation
337
    - as_multiclass: Multiclass format for binary classification
338
    - train_data: Training data (default: original training data)
339
    - internal_oof: Use internal OOF format
340
    
341
    Returns:
342
    Out-of-fold prediction probabilities
343
    """
344
```
345

346
### Model Management
347

348
Comprehensive model lifecycle management including saving, loading, cloning, and optimization for deployment scenarios.
349

350
```python { .api }
351
def save(self, silent: bool = False) -> str:
352
    """
353
    Save predictor to disk.
354
    
355
    Parameters:
356
    - silent: Suppress output messages
357
    
358
    Returns:
359
    Path where predictor was saved
360
    """
361

362
@classmethod
363
def load(
364
    cls,
365
    path: str,
366
    verbosity: int = None,
367
    require_version_match: bool = True,
368
    require_py_version_match: bool = True
369
) -> 'TabularPredictor':
370
    """
371
    Load a saved predictor from disk.
372
    
373
    Parameters:
374
    - path: Path to saved predictor
375
    - verbosity: Logging level override
376
    - require_version_match: Require AutoGluon version match
377
    - require_py_version_match: Require Python version match
378
    
379
    Returns:
380
    Loaded TabularPredictor instance
381
    """
382

383
def clone(
384
    self,
385
    path: str,
386
    return_clone: bool = False,
387
    dirs_exist_ok: bool = False
388
) -> str | 'TabularPredictor':
389
    """
390
    Create a copy of the predictor at a new location.
391
    
392
    Parameters:
393
    - path: Destination path for cloned predictor
394
    - return_clone: Return cloned predictor object
395
    - dirs_exist_ok: Allow overwriting existing directory
396
    
397
    Returns:
398
    Path to cloned predictor or cloned predictor object
399
    """
400

401
def clone_for_deployment(
402
    self,
403
    path: str,
404
    model: str = "best",
405
    return_clone: bool = False,
406
    dirs_exist_ok: bool = False
407
) -> str | 'TabularPredictor':
408
    """
409
    Create a deployment-optimized copy with minimal footprint.
410
    
411
    Parameters:
412
    - path: Destination path for deployment clone
413
    - model: Specific model to include in deployment
414
    - return_clone: Return cloned predictor object
415
    - dirs_exist_ok: Allow overwriting existing directory
416
    
417
    Returns:
418
    Path to deployment clone or cloned predictor object
419
    """
420

421
def save_space(
422
    self,
423
    remove_data: bool = True,
424
    remove_fit_stack: bool = True,
425
    requires_save: bool = True,
426
    reduce_children: bool = False
427
) -> str:
428
    """
429
    Reduce predictor disk usage by removing non-essential files.
430
    
431
    Parameters:
432
    - remove_data: Remove cached training data
433
    - remove_fit_stack: Remove intermediate stacking models
434
    - requires_save: Save predictor after space reduction
435
    - reduce_children: Apply space reduction to child models
436
    
437
    Returns:
438
    Path to optimized predictor
439
    """
440
```
441

442
### Properties and Inspection
443

444
Access to predictor metadata, model information, and internal state for analysis and debugging.
445

446
```python { .api }
447
@property
448
def classes_(self) -> list:
449
    """Available classes for classification problems."""
450

451
@property  
452
def class_labels(self) -> list:
453
    """Class labels in original format."""
454

455
@property
456
def problem_type(self) -> str:
457
    """Type of ML problem (binary, multiclass, regression, etc.)."""
458

459
@property
460
def eval_metric(self) -> str:
461
    """Evaluation metric used for model selection."""
462

463
@property
464
def label(self) -> str:
465
    """Name of the target column."""
466

467
@property
468
def path(self) -> str:
469
    """Path where predictor is saved."""
470

471
@property
472
def features(self) -> list[str]:
473
    """List of feature names used by models."""
474

475
@property
476
def original_features(self) -> list[str]:
477
    """List of original feature names from training data."""
478

479
def features(self, feature_stage: str = "original") -> list[str]:
480
    """
481
    Get feature names at different processing stages.
482
    
483
    Parameters:
484
    - feature_stage: Stage of feature processing ('original', 'transformed')
485
    
486
    Returns:
487
    List of feature names
488
    """
489

490
@property
491
def feature_metadata(self) -> FeatureMetadata:
492
    """Metadata about features including types and preprocessing."""
493

494
def set_decision_threshold(self, decision_threshold: float) -> None:
495
    """
496
    Set custom decision threshold for binary classification.
497
    
498
    Parameters:
499
    - decision_threshold: New threshold value (0.0 to 1.0)
500
    """
501

502
@property  
503
def decision_threshold(self) -> float | None:
504
    """Current decision threshold for binary classification."""
505
```
506

507
## Usage Examples
508

509
### Basic Classification
510

511
```python
512
from autogluon.tabular import TabularPredictor
513
import pandas as pd
514

515
# Load data
516
train_data = pd.read_csv('train.csv')
517
test_data = pd.read_csv('test.csv')
518

519
# Create predictor for binary classification
520
predictor = TabularPredictor(
521
    label='target',
522
    problem_type='binary',
523
    eval_metric='roc_auc'
524
)
525

526
# Train with time limit
527
predictor.fit(
528
    train_data,
529
    time_limit=600,  # 10 minutes
530
    presets='good_quality'
531
)
532

533
# Make predictions
534
predictions = predictor.predict(test_data)
535
probabilities = predictor.predict_proba(test_data)
536

537
# Evaluate performance
538
results = predictor.evaluate(test_data)
539
print(f"ROC-AUC: {results['roc_auc']:.4f}")
540

541
# View model leaderboard
542
leaderboard = predictor.leaderboard(test_data, extra_info=True)
543
print(leaderboard)
544
```
545

546
### Advanced Configuration
547

548
```python
549
# Custom hyperparameters
550
hyperparameters = {
551
    'LGB': {'num_leaves': [26, 66, 176]},
552
    'XGB': {'n_estimators': [50, 100, 200]},
553
    'CAT': {'iterations': [100, 200, 500]}
554
}
555

556
# Advanced training with custom settings
557
predictor = TabularPredictor(
558
    label='target',
559
    sample_weight='weights',
560
    path='./models/'
561
)
562

563
predictor.fit(
564
    train_data,
565
    hyperparameters=hyperparameters,
566
    num_bag_folds=5,
567
    num_stack_levels=2,
568
    ag_args_fit={'num_cpus': 8},
569
    excluded_model_types=['KNN', 'XT']
570
)
571

572
# Multi-model predictions for ensemble analysis  
573
multi_preds = predictor.predict_multi(test_data)
574
model_comparison = pd.DataFrame(multi_preds)
575
```

Version

Tile

Files

predictor.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}docs/

predictor.mddocs/