or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

configurations.mdexperimental.mdindex.mdmodels.mdpredictor.md

predictor.mddocs/

0

# Core Prediction Interface

1

2

The TabularPredictor class provides the main interface for automated machine learning on tabular datasets. It handles the complete ML pipeline from data preprocessing to model training, evaluation, and deployment with minimal user configuration required.

3

4

## Capabilities

5

6

### Predictor Initialization

7

8

Creates a new TabularPredictor instance configured for a specific prediction task with automatic problem type detection and evaluation metric selection.

9

10

```python { .api }

11

class TabularPredictor:

12

def __init__(

13

self,

14

label: str,

15

problem_type: str = None,

16

eval_metric: str | Scorer = None,

17

path: str = None,

18

verbosity: int = 2,

19

log_to_file: bool = False,

20

log_file_path: str = "auto",

21

sample_weight: str = None,

22

weight_evaluation: bool = False,

23

groups: str = None,

24

positive_class: int | str | None = None,

25

**kwargs

26

):

27

"""

28

Initialize TabularPredictor for automated machine learning.

29

30

Parameters:

31

- label: Name of the target column to predict

32

- problem_type: Type of problem ('binary', 'multiclass', 'regression', 'quantile')

33

- eval_metric: Metric for model evaluation and selection

34

- path: Directory to save models and outputs

35

- verbosity: Logging level (0-4)

36

- log_to_file: Whether to save logs to file

37

- log_file_path: Path for log file (auto for default)

38

- sample_weight: Column name for sample weights or 'auto_weight'/'balance_weight'

39

- weight_evaluation: Whether to use sample weights in evaluation

40

- groups: Column for custom data splitting in bagging

41

- positive_class: Positive class for binary classification metrics

42

"""

43

```

44

45

### Model Training

46

47

Trains multiple machine learning models with automatic hyperparameter optimization, ensemble creation, and model selection using advanced techniques like bagging and stacking.

48

49

```python { .api }

50

def fit(

51

self,

52

train_data: pd.DataFrame | str,

53

tuning_data: pd.DataFrame | str = None,

54

time_limit: float = None,

55

presets: list[str] | str = None,

56

hyperparameters: dict | str = None,

57

feature_metadata: str | FeatureMetadata = "infer",

58

infer_limit: float = None,

59

infer_limit_batch_size: int = None,

60

fit_weighted_ensemble: bool = True,

61

fit_full_last_level_weighted_ensemble: bool = True,

62

full_weighted_ensemble_additionally: bool = False,

63

dynamic_stacking: bool | str = False,

64

calibrate_decision_threshold: bool | str = "auto",

65

num_cpus: int | str = "auto",

66

num_gpus: int | str = "auto",

67

fit_strategy: Literal["sequential", "parallel"] = "sequential",

68

memory_limit: float | str = "auto",

69

callbacks: list[AbstractCallback] = None,

70

**kwargs

71

) -> 'TabularPredictor':

72

"""

73

Train machine learning models on the provided dataset.

74

75

Parameters:

76

- train_data: Training dataset as DataFrame or file path string

77

- tuning_data: Optional validation dataset as DataFrame or file path string

78

- time_limit: Maximum training time in seconds (float)

79

- presets: Pre-configured settings list or single preset ('best_quality', 'high_quality', etc.)

80

- hyperparameters: Custom hyperparameter configurations as dict or preset string

81

- feature_metadata: Feature metadata configuration or "infer" for automatic detection

82

- infer_limit: Time limit for feature inference in seconds

83

- infer_limit_batch_size: Batch size for feature inference

84

- fit_weighted_ensemble: Whether to fit weighted ensemble models

85

- fit_full_last_level_weighted_ensemble: Whether to fit full last level weighted ensemble

86

- full_weighted_ensemble_additionally: Whether to fit additional full weighted ensemble

87

- dynamic_stacking: Whether to use dynamic stacking (bool or strategy string)

88

- calibrate_decision_threshold: Whether to calibrate decision threshold ("auto", True, False)

89

- num_cpus: Number of CPUs to use ("auto" or integer)

90

- num_gpus: Number of GPUs to use ("auto" or integer)

91

- fit_strategy: Strategy for fitting models ("sequential" or "parallel")

92

- memory_limit: Memory limit ("auto" or float in GB)

93

- callbacks: List of callback functions for training monitoring

94

95

Returns:

96

Self (TabularPredictor instance)

97

"""

98

```

99

100

### Predictions

101

102

Generates predictions using trained models with options for single model or ensemble predictions, automatic feature transformation, and flexible output formats.

103

104

```python { .api }

105

def predict(

106

self,

107

data: pd.DataFrame | str,

108

model: str = None,

109

as_pandas: bool = True,

110

transform_features: bool = True,

111

*,

112

decision_threshold: float = None,

113

**kwargs

114

) -> pd.Series | np.ndarray:

115

"""

116

Generate predictions for new data.

117

118

Parameters:

119

- data: Input data or path to data file

120

- model: Specific model to use (default: best model)

121

- as_pandas: Return pandas Series (True) or numpy array (False)

122

- transform_features: Apply feature preprocessing

123

- decision_threshold: Decision threshold for binary classification

124

125

Returns:

126

Predictions as pandas Series or numpy array

127

"""

128

129

def predict_proba(

130

self,

131

data: pd.DataFrame | str,

132

model: str = None,

133

as_pandas: bool = True,

134

as_multiclass: bool = True,

135

transform_features: bool = True,

136

**kwargs

137

) -> pd.DataFrame | pd.Series | np.ndarray:

138

"""

139

Generate prediction probabilities for classification tasks.

140

141

Parameters:

142

- data: Input data or path to data file

143

- model: Specific model to use (default: best model)

144

- as_pandas: Return pandas DataFrame (True) or numpy array (False)

145

- as_multiclass: Return multiclass format for binary classification

146

- transform_features: Apply feature preprocessing

147

148

Returns:

149

Prediction probabilities as pandas DataFrame or numpy array

150

"""

151

152

def predict_from_proba(

153

self,

154

y_pred_proba: pd.DataFrame | np.ndarray,

155

decision_threshold: float = None

156

) -> pd.Series | np.ndarray:

157

"""

158

Convert prediction probabilities to class predictions.

159

160

Parameters:

161

- y_pred_proba: Prediction probabilities

162

- decision_threshold: Custom threshold for binary classification

163

164

Returns:

165

Class predictions

166

"""

167

```

168

169

### Multi-Model Predictions

170

171

Generates predictions from multiple models simultaneously for model comparison, uncertainty estimation, and ensemble analysis.

172

173

```python { .api }

174

def predict_multi(

175

self,

176

data: pd.DataFrame = None,

177

models: list[str] = None,

178

as_pandas: bool = True,

179

transform_features: bool = True,

180

**kwargs

181

) -> pd.DataFrame | dict:

182

"""

183

Generate predictions from multiple models.

184

185

Parameters:

186

- data: Input data

187

- models: List of model names (default: all models)

188

- as_pandas: Return format

189

- transform_features: Apply feature preprocessing

190

191

Returns:

192

Multi-model predictions

193

"""

194

195

def predict_proba_multi(

196

self,

197

data: pd.DataFrame = None,

198

models: list[str] = None,

199

as_pandas: bool = True,

200

as_multiclass: bool = True,

201

**kwargs

202

) -> dict:

203

"""

204

Generate prediction probabilities from multiple models.

205

206

Parameters:

207

- data: Input data

208

- models: List of model names (default: all models)

209

- as_pandas: Return format

210

- as_multiclass: Multiclass format for binary classification

211

212

Returns:

213

Multi-model prediction probabilities

214

"""

215

```

216

217

### Model Evaluation

218

219

Comprehensive model evaluation with multiple metrics, detailed performance analysis, and comparison across different models and datasets.

220

221

```python { .api }

222

def evaluate(

223

self,

224

data: pd.DataFrame | str,

225

model: str = None,

226

silent: bool = False,

227

auxiliary_metrics: bool = True,

228

detailed_report: bool = False,

229

**kwargs

230

) -> dict:

231

"""

232

Evaluate model performance on provided dataset.

233

234

Parameters:

235

- data: Evaluation data or path to data file

236

- model: Specific model to evaluate (default: best model)

237

- silent: Suppress printed output

238

- auxiliary_metrics: Include additional metrics beyond eval_metric

239

- detailed_report: Generate detailed evaluation report

240

241

Returns:

242

Dictionary of evaluation metrics and scores

243

"""

244

245

def evaluate_predictions(

246

self,

247

y_true: pd.Series | np.ndarray,

248

y_pred: pd.Series | np.ndarray,

249

sample_weight: pd.Series | np.ndarray = None,

250

decision_threshold: float = None,

251

display: bool = False,

252

auxiliary_metrics: bool = True,

253

detailed_report: bool = False,

254

**kwargs

255

) -> dict:

256

"""

257

Evaluate predictions directly without requiring predictor or data.

258

259

Parameters:

260

- y_true: Ground truth labels

261

- y_pred: Model predictions

262

- sample_weight: Sample weights for evaluation

263

- decision_threshold: Threshold for binary classification

264

- display: Print evaluation results

265

- auxiliary_metrics: Include additional metrics

266

- detailed_report: Generate detailed report

267

268

Returns:

269

Dictionary of evaluation metrics

270

"""

271

272

def leaderboard(

273

self,

274

data: pd.DataFrame | str = None,

275

extra_info: bool = False,

276

only_pareto_frontier: bool = False,

277

skip_score: bool = False,

278

**kwargs

279

) -> pd.DataFrame:

280

"""

281

Generate model leaderboard with performance rankings.

282

283

Parameters:

284

- data: Evaluation data (default: validation data)

285

- extra_info: Include additional model information

286

- only_pareto_frontier: Show only Pareto optimal models

287

- skip_score: Skip scoring models (faster)

288

289

Returns:

290

DataFrame with model rankings and performance metrics

291

"""

292

```

293

294

### Out-of-Fold Predictions

295

296

Advanced functionality for accessing out-of-fold predictions from cross-validation, useful for stacking, analysis, and debugging model performance.

297

298

```python { .api }

299

def predict_oof(

300

self,

301

model: str = None,

302

transformed: bool = False,

303

train_data: pd.DataFrame = None,

304

internal_oof: bool = False,

305

decision_threshold: float = None,

306

**kwargs

307

) -> pd.Series:

308

"""

309

Get out-of-fold predictions for training data.

310

311

Parameters:

312

- model: Model name (default: best model)

313

- transformed: Use transformed feature representation

314

- train_data: Training data (default: original training data)

315

- internal_oof: Use internal OOF format

316

- decision_threshold: Threshold for binary classification

317

318

Returns:

319

Out-of-fold predictions for training data

320

"""

321

322

def predict_proba_oof(

323

self,

324

model: str = None,

325

transformed: bool = False,

326

as_multiclass: bool = True,

327

train_data: pd.DataFrame = None,

328

internal_oof: bool = False,

329

**kwargs

330

) -> pd.DataFrame | pd.Series:

331

"""

332

Get out-of-fold prediction probabilities for training data.

333

334

Parameters:

335

- model: Model name (default: best model)

336

- transformed: Use transformed feature representation

337

- as_multiclass: Multiclass format for binary classification

338

- train_data: Training data (default: original training data)

339

- internal_oof: Use internal OOF format

340

341

Returns:

342

Out-of-fold prediction probabilities

343

"""

344

```

345

346

### Model Management

347

348

Comprehensive model lifecycle management including saving, loading, cloning, and optimization for deployment scenarios.

349

350

```python { .api }

351

def save(self, silent: bool = False) -> str:

352

"""

353

Save predictor to disk.

354

355

Parameters:

356

- silent: Suppress output messages

357

358

Returns:

359

Path where predictor was saved

360

"""

361

362

@classmethod

363

def load(

364

cls,

365

path: str,

366

verbosity: int = None,

367

require_version_match: bool = True,

368

require_py_version_match: bool = True

369

) -> 'TabularPredictor':

370

"""

371

Load a saved predictor from disk.

372

373

Parameters:

374

- path: Path to saved predictor

375

- verbosity: Logging level override

376

- require_version_match: Require AutoGluon version match

377

- require_py_version_match: Require Python version match

378

379

Returns:

380

Loaded TabularPredictor instance

381

"""

382

383

def clone(

384

self,

385

path: str,

386

return_clone: bool = False,

387

dirs_exist_ok: bool = False

388

) -> str | 'TabularPredictor':

389

"""

390

Create a copy of the predictor at a new location.

391

392

Parameters:

393

- path: Destination path for cloned predictor

394

- return_clone: Return cloned predictor object

395

- dirs_exist_ok: Allow overwriting existing directory

396

397

Returns:

398

Path to cloned predictor or cloned predictor object

399

"""

400

401

def clone_for_deployment(

402

self,

403

path: str,

404

model: str = "best",

405

return_clone: bool = False,

406

dirs_exist_ok: bool = False

407

) -> str | 'TabularPredictor':

408

"""

409

Create a deployment-optimized copy with minimal footprint.

410

411

Parameters:

412

- path: Destination path for deployment clone

413

- model: Specific model to include in deployment

414

- return_clone: Return cloned predictor object

415

- dirs_exist_ok: Allow overwriting existing directory

416

417

Returns:

418

Path to deployment clone or cloned predictor object

419

"""

420

421

def save_space(

422

self,

423

remove_data: bool = True,

424

remove_fit_stack: bool = True,

425

requires_save: bool = True,

426

reduce_children: bool = False

427

) -> str:

428

"""

429

Reduce predictor disk usage by removing non-essential files.

430

431

Parameters:

432

- remove_data: Remove cached training data

433

- remove_fit_stack: Remove intermediate stacking models

434

- requires_save: Save predictor after space reduction

435

- reduce_children: Apply space reduction to child models

436

437

Returns:

438

Path to optimized predictor

439

"""

440

```

441

442

### Properties and Inspection

443

444

Access to predictor metadata, model information, and internal state for analysis and debugging.

445

446

```python { .api }

447

@property

448

def classes_(self) -> list:

449

"""Available classes for classification problems."""

450

451

@property

452

def class_labels(self) -> list:

453

"""Class labels in original format."""

454

455

@property

456

def problem_type(self) -> str:

457

"""Type of ML problem (binary, multiclass, regression, etc.)."""

458

459

@property

460

def eval_metric(self) -> str:

461

"""Evaluation metric used for model selection."""

462

463

@property

464

def label(self) -> str:

465

"""Name of the target column."""

466

467

@property

468

def path(self) -> str:

469

"""Path where predictor is saved."""

470

471

@property

472

def features(self) -> list[str]:

473

"""List of feature names used by models."""

474

475

@property

476

def original_features(self) -> list[str]:

477

"""List of original feature names from training data."""

478

479

def features(self, feature_stage: str = "original") -> list[str]:

480

"""

481

Get feature names at different processing stages.

482

483

Parameters:

484

- feature_stage: Stage of feature processing ('original', 'transformed')

485

486

Returns:

487

List of feature names

488

"""

489

490

@property

491

def feature_metadata(self) -> FeatureMetadata:

492

"""Metadata about features including types and preprocessing."""

493

494

def set_decision_threshold(self, decision_threshold: float) -> None:

495

"""

496

Set custom decision threshold for binary classification.

497

498

Parameters:

499

- decision_threshold: New threshold value (0.0 to 1.0)

500

"""

501

502

@property

503

def decision_threshold(self) -> float | None:

504

"""Current decision threshold for binary classification."""

505

```

506

507

## Usage Examples

508

509

### Basic Classification

510

511

```python

512

from autogluon.tabular import TabularPredictor

513

import pandas as pd

514

515

# Load data

516

train_data = pd.read_csv('train.csv')

517

test_data = pd.read_csv('test.csv')

518

519

# Create predictor for binary classification

520

predictor = TabularPredictor(

521

label='target',

522

problem_type='binary',

523

eval_metric='roc_auc'

524

)

525

526

# Train with time limit

527

predictor.fit(

528

train_data,

529

time_limit=600, # 10 minutes

530

presets='good_quality'

531

)

532

533

# Make predictions

534

predictions = predictor.predict(test_data)

535

probabilities = predictor.predict_proba(test_data)

536

537

# Evaluate performance

538

results = predictor.evaluate(test_data)

539

print(f"ROC-AUC: {results['roc_auc']:.4f}")

540

541

# View model leaderboard

542

leaderboard = predictor.leaderboard(test_data, extra_info=True)

543

print(leaderboard)

544

```

545

546

### Advanced Configuration

547

548

```python

549

# Custom hyperparameters

550

hyperparameters = {

551

'LGB': {'num_leaves': [26, 66, 176]},

552

'XGB': {'n_estimators': [50, 100, 200]},

553

'CAT': {'iterations': [100, 200, 500]}

554

}

555

556

# Advanced training with custom settings

557

predictor = TabularPredictor(

558

label='target',

559

sample_weight='weights',

560

path='./models/'

561

)

562

563

predictor.fit(

564

train_data,

565

hyperparameters=hyperparameters,

566

num_bag_folds=5,

567

num_stack_levels=2,

568

ag_args_fit={'num_cpus': 8},

569

excluded_model_types=['KNN', 'XT']

570

)

571

572

# Multi-model predictions for ensemble analysis

573

multi_preds = predictor.predict_multi(test_data)

574

model_comparison = pd.DataFrame(multi_preds)

575

```