0
# Core Prediction Interface
1
2
The TabularPredictor class provides the main interface for automated machine learning on tabular datasets. It handles the complete ML pipeline from data preprocessing to model training, evaluation, and deployment with minimal user configuration required.
3
4
## Capabilities
5
6
### Predictor Initialization
7
8
Creates a new TabularPredictor instance configured for a specific prediction task with automatic problem type detection and evaluation metric selection.
9
10
```python { .api }
11
class TabularPredictor:
12
def __init__(
13
self,
14
label: str,
15
problem_type: str = None,
16
eval_metric: str | Scorer = None,
17
path: str = None,
18
verbosity: int = 2,
19
log_to_file: bool = False,
20
log_file_path: str = "auto",
21
sample_weight: str = None,
22
weight_evaluation: bool = False,
23
groups: str = None,
24
positive_class: int | str | None = None,
25
**kwargs
26
):
27
"""
28
Initialize TabularPredictor for automated machine learning.
29
30
Parameters:
31
- label: Name of the target column to predict
32
- problem_type: Type of problem ('binary', 'multiclass', 'regression', 'quantile')
33
- eval_metric: Metric for model evaluation and selection
34
- path: Directory to save models and outputs
35
- verbosity: Logging level (0-4)
36
- log_to_file: Whether to save logs to file
37
- log_file_path: Path for log file (auto for default)
38
- sample_weight: Column name for sample weights or 'auto_weight'/'balance_weight'
39
- weight_evaluation: Whether to use sample weights in evaluation
40
- groups: Column for custom data splitting in bagging
41
- positive_class: Positive class for binary classification metrics
42
"""
43
```
44
45
### Model Training
46
47
Trains multiple machine learning models with automatic hyperparameter optimization, ensemble creation, and model selection using advanced techniques like bagging and stacking.
48
49
```python { .api }
50
def fit(
51
self,
52
train_data: pd.DataFrame | str,
53
tuning_data: pd.DataFrame | str = None,
54
time_limit: float = None,
55
presets: list[str] | str = None,
56
hyperparameters: dict | str = None,
57
feature_metadata: str | FeatureMetadata = "infer",
58
infer_limit: float = None,
59
infer_limit_batch_size: int = None,
60
fit_weighted_ensemble: bool = True,
61
fit_full_last_level_weighted_ensemble: bool = True,
62
full_weighted_ensemble_additionally: bool = False,
63
dynamic_stacking: bool | str = False,
64
calibrate_decision_threshold: bool | str = "auto",
65
num_cpus: int | str = "auto",
66
num_gpus: int | str = "auto",
67
fit_strategy: Literal["sequential", "parallel"] = "sequential",
68
memory_limit: float | str = "auto",
69
callbacks: list[AbstractCallback] = None,
70
**kwargs
71
) -> 'TabularPredictor':
72
"""
73
Train machine learning models on the provided dataset.
74
75
Parameters:
76
- train_data: Training dataset as DataFrame or file path string
77
- tuning_data: Optional validation dataset as DataFrame or file path string
78
- time_limit: Maximum training time in seconds (float)
79
- presets: Pre-configured settings list or single preset ('best_quality', 'high_quality', etc.)
80
- hyperparameters: Custom hyperparameter configurations as dict or preset string
81
- feature_metadata: Feature metadata configuration or "infer" for automatic detection
82
- infer_limit: Time limit for feature inference in seconds
83
- infer_limit_batch_size: Batch size for feature inference
84
- fit_weighted_ensemble: Whether to fit weighted ensemble models
85
- fit_full_last_level_weighted_ensemble: Whether to fit full last level weighted ensemble
86
- full_weighted_ensemble_additionally: Whether to fit additional full weighted ensemble
87
- dynamic_stacking: Whether to use dynamic stacking (bool or strategy string)
88
- calibrate_decision_threshold: Whether to calibrate decision threshold ("auto", True, False)
89
- num_cpus: Number of CPUs to use ("auto" or integer)
90
- num_gpus: Number of GPUs to use ("auto" or integer)
91
- fit_strategy: Strategy for fitting models ("sequential" or "parallel")
92
- memory_limit: Memory limit ("auto" or float in GB)
93
- callbacks: List of callback functions for training monitoring
94
95
Returns:
96
Self (TabularPredictor instance)
97
"""
98
```
99
100
### Predictions
101
102
Generates predictions using trained models with options for single model or ensemble predictions, automatic feature transformation, and flexible output formats.
103
104
```python { .api }
105
def predict(
106
self,
107
data: pd.DataFrame | str,
108
model: str = None,
109
as_pandas: bool = True,
110
transform_features: bool = True,
111
*,
112
decision_threshold: float = None,
113
**kwargs
114
) -> pd.Series | np.ndarray:
115
"""
116
Generate predictions for new data.
117
118
Parameters:
119
- data: Input data or path to data file
120
- model: Specific model to use (default: best model)
121
- as_pandas: Return pandas Series (True) or numpy array (False)
122
- transform_features: Apply feature preprocessing
123
- decision_threshold: Decision threshold for binary classification
124
125
Returns:
126
Predictions as pandas Series or numpy array
127
"""
128
129
def predict_proba(
130
self,
131
data: pd.DataFrame | str,
132
model: str = None,
133
as_pandas: bool = True,
134
as_multiclass: bool = True,
135
transform_features: bool = True,
136
**kwargs
137
) -> pd.DataFrame | pd.Series | np.ndarray:
138
"""
139
Generate prediction probabilities for classification tasks.
140
141
Parameters:
142
- data: Input data or path to data file
143
- model: Specific model to use (default: best model)
144
- as_pandas: Return pandas DataFrame (True) or numpy array (False)
145
- as_multiclass: Return multiclass format for binary classification
146
- transform_features: Apply feature preprocessing
147
148
Returns:
149
Prediction probabilities as pandas DataFrame or numpy array
150
"""
151
152
def predict_from_proba(
153
self,
154
y_pred_proba: pd.DataFrame | np.ndarray,
155
decision_threshold: float = None
156
) -> pd.Series | np.ndarray:
157
"""
158
Convert prediction probabilities to class predictions.
159
160
Parameters:
161
- y_pred_proba: Prediction probabilities
162
- decision_threshold: Custom threshold for binary classification
163
164
Returns:
165
Class predictions
166
"""
167
```
168
169
### Multi-Model Predictions
170
171
Generates predictions from multiple models simultaneously for model comparison, uncertainty estimation, and ensemble analysis.
172
173
```python { .api }
174
def predict_multi(
175
self,
176
data: pd.DataFrame = None,
177
models: list[str] = None,
178
as_pandas: bool = True,
179
transform_features: bool = True,
180
**kwargs
181
) -> pd.DataFrame | dict:
182
"""
183
Generate predictions from multiple models.
184
185
Parameters:
186
- data: Input data
187
- models: List of model names (default: all models)
188
- as_pandas: Return format
189
- transform_features: Apply feature preprocessing
190
191
Returns:
192
Multi-model predictions
193
"""
194
195
def predict_proba_multi(
196
self,
197
data: pd.DataFrame = None,
198
models: list[str] = None,
199
as_pandas: bool = True,
200
as_multiclass: bool = True,
201
**kwargs
202
) -> dict:
203
"""
204
Generate prediction probabilities from multiple models.
205
206
Parameters:
207
- data: Input data
208
- models: List of model names (default: all models)
209
- as_pandas: Return format
210
- as_multiclass: Multiclass format for binary classification
211
212
Returns:
213
Multi-model prediction probabilities
214
"""
215
```
216
217
### Model Evaluation
218
219
Comprehensive model evaluation with multiple metrics, detailed performance analysis, and comparison across different models and datasets.
220
221
```python { .api }
222
def evaluate(
223
self,
224
data: pd.DataFrame | str,
225
model: str = None,
226
silent: bool = False,
227
auxiliary_metrics: bool = True,
228
detailed_report: bool = False,
229
**kwargs
230
) -> dict:
231
"""
232
Evaluate model performance on provided dataset.
233
234
Parameters:
235
- data: Evaluation data or path to data file
236
- model: Specific model to evaluate (default: best model)
237
- silent: Suppress printed output
238
- auxiliary_metrics: Include additional metrics beyond eval_metric
239
- detailed_report: Generate detailed evaluation report
240
241
Returns:
242
Dictionary of evaluation metrics and scores
243
"""
244
245
def evaluate_predictions(
246
self,
247
y_true: pd.Series | np.ndarray,
248
y_pred: pd.Series | np.ndarray,
249
sample_weight: pd.Series | np.ndarray = None,
250
decision_threshold: float = None,
251
display: bool = False,
252
auxiliary_metrics: bool = True,
253
detailed_report: bool = False,
254
**kwargs
255
) -> dict:
256
"""
257
Evaluate predictions directly without requiring predictor or data.
258
259
Parameters:
260
- y_true: Ground truth labels
261
- y_pred: Model predictions
262
- sample_weight: Sample weights for evaluation
263
- decision_threshold: Threshold for binary classification
264
- display: Print evaluation results
265
- auxiliary_metrics: Include additional metrics
266
- detailed_report: Generate detailed report
267
268
Returns:
269
Dictionary of evaluation metrics
270
"""
271
272
def leaderboard(
273
self,
274
data: pd.DataFrame | str = None,
275
extra_info: bool = False,
276
only_pareto_frontier: bool = False,
277
skip_score: bool = False,
278
**kwargs
279
) -> pd.DataFrame:
280
"""
281
Generate model leaderboard with performance rankings.
282
283
Parameters:
284
- data: Evaluation data (default: validation data)
285
- extra_info: Include additional model information
286
- only_pareto_frontier: Show only Pareto optimal models
287
- skip_score: Skip scoring models (faster)
288
289
Returns:
290
DataFrame with model rankings and performance metrics
291
"""
292
```
293
294
### Out-of-Fold Predictions
295
296
Advanced functionality for accessing out-of-fold predictions from cross-validation, useful for stacking, analysis, and debugging model performance.
297
298
```python { .api }
299
def predict_oof(
300
self,
301
model: str = None,
302
transformed: bool = False,
303
train_data: pd.DataFrame = None,
304
internal_oof: bool = False,
305
decision_threshold: float = None,
306
**kwargs
307
) -> pd.Series:
308
"""
309
Get out-of-fold predictions for training data.
310
311
Parameters:
312
- model: Model name (default: best model)
313
- transformed: Use transformed feature representation
314
- train_data: Training data (default: original training data)
315
- internal_oof: Use internal OOF format
316
- decision_threshold: Threshold for binary classification
317
318
Returns:
319
Out-of-fold predictions for training data
320
"""
321
322
def predict_proba_oof(
323
self,
324
model: str = None,
325
transformed: bool = False,
326
as_multiclass: bool = True,
327
train_data: pd.DataFrame = None,
328
internal_oof: bool = False,
329
**kwargs
330
) -> pd.DataFrame | pd.Series:
331
"""
332
Get out-of-fold prediction probabilities for training data.
333
334
Parameters:
335
- model: Model name (default: best model)
336
- transformed: Use transformed feature representation
337
- as_multiclass: Multiclass format for binary classification
338
- train_data: Training data (default: original training data)
339
- internal_oof: Use internal OOF format
340
341
Returns:
342
Out-of-fold prediction probabilities
343
"""
344
```
345
346
### Model Management
347
348
Comprehensive model lifecycle management including saving, loading, cloning, and optimization for deployment scenarios.
349
350
```python { .api }
351
def save(self, silent: bool = False) -> str:
352
"""
353
Save predictor to disk.
354
355
Parameters:
356
- silent: Suppress output messages
357
358
Returns:
359
Path where predictor was saved
360
"""
361
362
@classmethod
363
def load(
364
cls,
365
path: str,
366
verbosity: int = None,
367
require_version_match: bool = True,
368
require_py_version_match: bool = True
369
) -> 'TabularPredictor':
370
"""
371
Load a saved predictor from disk.
372
373
Parameters:
374
- path: Path to saved predictor
375
- verbosity: Logging level override
376
- require_version_match: Require AutoGluon version match
377
- require_py_version_match: Require Python version match
378
379
Returns:
380
Loaded TabularPredictor instance
381
"""
382
383
def clone(
384
self,
385
path: str,
386
return_clone: bool = False,
387
dirs_exist_ok: bool = False
388
) -> str | 'TabularPredictor':
389
"""
390
Create a copy of the predictor at a new location.
391
392
Parameters:
393
- path: Destination path for cloned predictor
394
- return_clone: Return cloned predictor object
395
- dirs_exist_ok: Allow overwriting existing directory
396
397
Returns:
398
Path to cloned predictor or cloned predictor object
399
"""
400
401
def clone_for_deployment(
402
self,
403
path: str,
404
model: str = "best",
405
return_clone: bool = False,
406
dirs_exist_ok: bool = False
407
) -> str | 'TabularPredictor':
408
"""
409
Create a deployment-optimized copy with minimal footprint.
410
411
Parameters:
412
- path: Destination path for deployment clone
413
- model: Specific model to include in deployment
414
- return_clone: Return cloned predictor object
415
- dirs_exist_ok: Allow overwriting existing directory
416
417
Returns:
418
Path to deployment clone or cloned predictor object
419
"""
420
421
def save_space(
422
self,
423
remove_data: bool = True,
424
remove_fit_stack: bool = True,
425
requires_save: bool = True,
426
reduce_children: bool = False
427
) -> str:
428
"""
429
Reduce predictor disk usage by removing non-essential files.
430
431
Parameters:
432
- remove_data: Remove cached training data
433
- remove_fit_stack: Remove intermediate stacking models
434
- requires_save: Save predictor after space reduction
435
- reduce_children: Apply space reduction to child models
436
437
Returns:
438
Path to optimized predictor
439
"""
440
```
441
442
### Properties and Inspection
443
444
Access to predictor metadata, model information, and internal state for analysis and debugging.
445
446
```python { .api }
447
@property
448
def classes_(self) -> list:
449
"""Available classes for classification problems."""
450
451
@property
452
def class_labels(self) -> list:
453
"""Class labels in original format."""
454
455
@property
456
def problem_type(self) -> str:
457
"""Type of ML problem (binary, multiclass, regression, etc.)."""
458
459
@property
460
def eval_metric(self) -> str:
461
"""Evaluation metric used for model selection."""
462
463
@property
464
def label(self) -> str:
465
"""Name of the target column."""
466
467
@property
468
def path(self) -> str:
469
"""Path where predictor is saved."""
470
471
@property
472
def features(self) -> list[str]:
473
"""List of feature names used by models."""
474
475
@property
476
def original_features(self) -> list[str]:
477
"""List of original feature names from training data."""
478
479
def features(self, feature_stage: str = "original") -> list[str]:
480
"""
481
Get feature names at different processing stages.
482
483
Parameters:
484
- feature_stage: Stage of feature processing ('original', 'transformed')
485
486
Returns:
487
List of feature names
488
"""
489
490
@property
491
def feature_metadata(self) -> FeatureMetadata:
492
"""Metadata about features including types and preprocessing."""
493
494
def set_decision_threshold(self, decision_threshold: float) -> None:
495
"""
496
Set custom decision threshold for binary classification.
497
498
Parameters:
499
- decision_threshold: New threshold value (0.0 to 1.0)
500
"""
501
502
@property
503
def decision_threshold(self) -> float | None:
504
"""Current decision threshold for binary classification."""
505
```
506
507
## Usage Examples
508
509
### Basic Classification
510
511
```python
512
from autogluon.tabular import TabularPredictor
513
import pandas as pd
514
515
# Load data
516
train_data = pd.read_csv('train.csv')
517
test_data = pd.read_csv('test.csv')
518
519
# Create predictor for binary classification
520
predictor = TabularPredictor(
521
label='target',
522
problem_type='binary',
523
eval_metric='roc_auc'
524
)
525
526
# Train with time limit
527
predictor.fit(
528
train_data,
529
time_limit=600, # 10 minutes
530
presets='good_quality'
531
)
532
533
# Make predictions
534
predictions = predictor.predict(test_data)
535
probabilities = predictor.predict_proba(test_data)
536
537
# Evaluate performance
538
results = predictor.evaluate(test_data)
539
print(f"ROC-AUC: {results['roc_auc']:.4f}")
540
541
# View model leaderboard
542
leaderboard = predictor.leaderboard(test_data, extra_info=True)
543
print(leaderboard)
544
```
545
546
### Advanced Configuration
547
548
```python
549
# Custom hyperparameters
550
hyperparameters = {
551
'LGB': {'num_leaves': [26, 66, 176]},
552
'XGB': {'n_estimators': [50, 100, 200]},
553
'CAT': {'iterations': [100, 200, 500]}
554
}
555
556
# Advanced training with custom settings
557
predictor = TabularPredictor(
558
label='target',
559
sample_weight='weights',
560
path='./models/'
561
)
562
563
predictor.fit(
564
train_data,
565
hyperparameters=hyperparameters,
566
num_bag_folds=5,
567
num_stack_levels=2,
568
ag_args_fit={'num_cpus': 8},
569
excluded_model_types=['KNN', 'XT']
570
)
571
572
# Multi-model predictions for ensemble analysis
573
multi_preds = predictor.predict_multi(test_data)
574
model_comparison = pd.DataFrame(multi_preds)
575
```