or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

core.mdfeatures.mdindex.mdmultimodal.mdtabular.mdtimeseries.md

tabular.mddocs/

0

# Tabular Machine Learning

1

2

Automated machine learning for structured/tabular data supporting binary classification, multiclass classification, and regression tasks. TabularPredictor automatically handles feature engineering, model selection, hyperparameter tuning, and intelligent ensembling to achieve strong predictive performance with minimal configuration.

3

4

## Capabilities

5

6

### TabularPredictor Class

7

8

Main predictor class for tabular/structured data that automates the entire ML pipeline from data preprocessing to model deployment.

9

10

```python { .api }

11

class TabularPredictor:

12

def __init__(

13

self,

14

label: str,

15

problem_type: str = None,

16

eval_metric: str = None,

17

path: str = None,

18

verbosity: int = 2,

19

sample_weight: str = None,

20

weight_evaluation: bool = False,

21

groups: str = None,

22

**kwargs

23

):

24

"""

25

Initialize TabularPredictor for automated machine learning on tabular data.

26

27

Parameters:

28

- label: Name of the target column to predict

29

- problem_type: Type of problem ('binary', 'multiclass', 'regression', 'quantile')

30

- eval_metric: Evaluation metric ('accuracy', 'roc_auc', 'rmse', etc.)

31

- path: Directory to save models and artifacts

32

- verbosity: Logging verbosity level (0-4)

33

- sample_weight: Column name for sample weights

34

- weight_evaluation: Whether to weight evaluation metrics

35

- groups: Column name for group information (for grouped CV)

36

"""

37

```

38

39

### Model Training

40

41

Train and automatically tune machine learning models on tabular data with intelligent preprocessing and model selection.

42

43

```python { .api }

44

def fit(

45

self,

46

train_data,

47

tuning_data=None,

48

time_limit: float = None,

49

presets: str = None,

50

hyperparameters=None,

51

feature_metadata=None,

52

infer_limit: float = None,

53

infer_limit_batch_size: int = None,

54

fit_weighted_ensemble: bool = True,

55

dynamic_stacking: bool = False,

56

calibrate_decision_threshold: str = "auto",

57

num_cpus: str = "auto",

58

num_gpus: str = "auto",

59

fit_strategy: str = "sequential",

60

memory_limit: str = "auto",

61

excluded_model_types: list = None,

62

included_model_types: list = None,

63

holdout_frac: float = None,

64

callbacks: list = None,

65

**kwargs

66

):

67

"""

68

Fit TabularPredictor on training data.

69

70

Parameters:

71

- train_data: Training data (DataFrame, file path, or TabularDataset)

72

- tuning_data: Validation data for hyperparameter tuning

73

- time_limit: Maximum training time in seconds

74

- presets: Quality/speed presets ('best_quality', 'high_quality', 'medium_quality', 'optimize_for_deployment')

75

- hyperparameters: Custom hyperparameter configurations

76

- feature_metadata: Manual feature type specifications or 'infer'

77

- infer_limit: Time limit for feature inference

78

- infer_limit_batch_size: Batch size for feature inference

79

- fit_weighted_ensemble: Whether to fit weighted ensemble models

80

- dynamic_stacking: Enable dynamic stacking for ensemble models

81

- calibrate_decision_threshold: Auto-calibrate decision threshold ('auto', True, False)

82

- num_cpus: Number of CPU cores ('auto' or int)

83

- num_gpus: Number of GPUs ('auto' or int)

84

- fit_strategy: Model fitting strategy ('sequential', 'parallel')

85

- memory_limit: Memory limit for training ('auto' or float)

86

- excluded_model_types: List of model types to exclude

87

- included_model_types: List of model types to include only

88

- holdout_frac: Fraction of data to hold out for validation

89

- callbacks: List of callback functions for training

90

91

Returns:

92

TabularPredictor: Fitted predictor instance

93

"""

94

```

95

96

### Prediction

97

98

Generate predictions and prediction probabilities for new data using the trained model ensemble.

99

100

```python { .api }

101

def predict(

102

self,

103

data,

104

model: str = None,

105

as_pandas: bool = True,

106

transform_features: bool = True

107

):

108

"""

109

Generate predictions for new data.

110

111

Parameters:

112

- data: Input data (DataFrame, file path, or TabularDataset)

113

- model: Specific model name to use for prediction

114

- as_pandas: Return results as pandas Series

115

- transform_features: Apply feature transformations

116

117

Returns:

118

Predictions as pandas Series or numpy array

119

"""

120

121

def predict_proba(

122

self,

123

data,

124

model: str = None,

125

as_pandas: bool = True,

126

as_multiclass: bool = True,

127

transform_features: bool = True

128

):

129

"""

130

Generate prediction probabilities for classification tasks.

131

132

Parameters:

133

- data: Input data (DataFrame, file path, or TabularDataset)

134

- model: Specific model name to use for prediction

135

- as_pandas: Return results as pandas DataFrame

136

- as_multiclass: Return all class probabilities vs just positive class

137

- transform_features: Apply feature transformations

138

139

Returns:

140

Prediction probabilities as pandas DataFrame or numpy array

141

"""

142

```

143

144

### Model Evaluation

145

146

Evaluate model performance and analyze results with comprehensive metrics and model comparison capabilities.

147

148

```python { .api }

149

def evaluate(

150

self,

151

data,

152

model: str = None,

153

auxiliary_metrics: bool = True,

154

detailed_report: bool = False,

155

silent: bool = False

156

):

157

"""

158

Evaluate predictor performance on test data.

159

160

Parameters:

161

- data: Test data (DataFrame, file path, or TabularDataset)

162

- model: Specific model to evaluate

163

- auxiliary_metrics: Include additional evaluation metrics

164

- detailed_report: Generate detailed evaluation report

165

- silent: Suppress output

166

167

Returns:

168

dict: Dictionary of evaluation metrics

169

"""

170

171

def leaderboard(

172

self,

173

data=None,

174

extra_info: bool = False,

175

only_pareto_frontier: bool = False,

176

skip_score: bool = False,

177

silent: bool = False

178

):

179

"""

180

Display model leaderboard with performance rankings.

181

182

Parameters:

183

- data: Test data for evaluation (optional)

184

- extra_info: Include additional model information

185

- only_pareto_frontier: Show only Pareto optimal models

186

- skip_score: Skip performance scoring

187

- silent: Suppress output

188

189

Returns:

190

DataFrame: Model leaderboard with performance metrics

191

"""

192

```

193

194

### Feature Analysis

195

196

Analyze feature importance and understand model behavior through interpretability tools.

197

198

```python { .api }

199

def feature_importance(

200

self,

201

data=None,

202

model: str = None,

203

features: list = None,

204

feature_stage: str = 'original',

205

subsample_size: int = 5000,

206

silent: bool = False

207

):

208

"""

209

Calculate feature importance scores.

210

211

Parameters:

212

- data: Data for importance calculation

213

- model: Specific model to analyze

214

- features: Specific features to analyze

215

- feature_stage: Feature processing stage ('original' or 'transformed')

216

- subsample_size: Sample size for efficient computation

217

- silent: Suppress output

218

219

Returns:

220

DataFrame: Feature importance scores

221

"""

222

223

def fit_summary(self, verbosity: int = 1, show_plot: bool = False):

224

"""

225

Display summary of training process and results.

226

227

Parameters:

228

- verbosity: Detail level (0-4)

229

- show_plot: Show training plots

230

231

Returns:

232

dict: Training summary information

233

"""

234

```

235

236

### Model Persistence

237

238

Save and load trained predictors for deployment and reuse.

239

240

```python { .api }

241

def save(self, path: str = None):

242

"""

243

Save trained predictor to disk.

244

245

Parameters:

246

- path: Directory to save predictor

247

"""

248

249

@classmethod

250

def load(cls, path: str, verbosity: int = 2):

251

"""

252

Load saved predictor from disk.

253

254

Parameters:

255

- path: Directory containing saved predictor

256

- verbosity: Logging verbosity level

257

258

Returns:

259

TabularPredictor: Loaded predictor instance

260

"""

261

```

262

263

### Advanced Features

264

265

Advanced model configuration and specialized functionality for power users.

266

267

```python { .api }

268

def refit_full(self, model: str = 'best'):

269

"""

270

Refit model on full dataset (train + validation).

271

272

Parameters:

273

- model: Model to refit ('best', 'all', or specific model name)

274

275

Returns:

276

dict: Refit results

277

"""

278

279

def distill(

280

self,

281

train_data=None,

282

tuning_data=None,

283

time_limit: int = None,

284

hyperparameters=None,

285

**kwargs

286

):

287

"""

288

Create distilled (compressed) version of ensemble model.

289

290

Parameters:

291

- train_data: Training data for distillation

292

- tuning_data: Validation data for distillation

293

- time_limit: Maximum distillation time

294

- hyperparameters: Distillation hyperparameters

295

296

Returns:

297

dict: Distillation results

298

"""

299

300

def persist_models(self, models: list = None, with_ancestors: bool = True):

301

"""

302

Persist models in memory to disk for memory optimization.

303

304

Parameters:

305

- models: List of model names to persist

306

- with_ancestors: Include ancestor models in persistence

307

"""

308

309

def unpersist_models(self, models: list = None):

310

"""

311

Load persisted models back into memory.

312

313

Parameters:

314

- models: List of model names to unpersist

315

"""

316

317

def calibrate_decision_threshold(

318

self,

319

data=None,

320

metric: str = None,

321

return_optimization_curve: bool = False,

322

verbose: bool = True

323

):

324

"""

325

Calibrate decision threshold for binary classification to optimize specified metric.

326

327

Parameters:

328

- data: Data to use for threshold calibration

329

- metric: Metric to optimize ('f1', 'balanced_accuracy', 'mcc', etc.)

330

- return_optimization_curve: Return threshold vs metric curve

331

- verbose: Print optimization results

332

333

Returns:

334

dict or tuple: Calibration results, optionally with optimization curve

335

"""

336

337

def clone(self, path: str, *, return_clone: bool = False, dirs_exist_ok: bool = False):

338

"""

339

Create a copy of the predictor at a new location.

340

341

Parameters:

342

- path: Directory path for the cloned predictor

343

- return_clone: Return the cloned predictor instance

344

- dirs_exist_ok: Allow overwriting existing directory

345

346

Returns:

347

str or TabularPredictor: Path to clone or cloned predictor instance

348

"""

349

350

def clone_for_deployment(

351

self,

352

path: str,

353

*,

354

model: str = "best",

355

return_clone: bool = False,

356

dirs_exist_ok: bool = False

357

):

358

"""

359

Create optimized copy of predictor for deployment with minimal storage footprint.

360

361

Parameters:

362

- path: Directory path for deployment clone

363

- model: Model to include in deployment clone

364

- return_clone: Return the cloned predictor instance

365

- dirs_exist_ok: Allow overwriting existing directory

366

367

Returns:

368

str or TabularPredictor: Path to clone or cloned predictor instance

369

"""

370

```

371

372

### InterpretableTabularPredictor Class

373

374

**[EXPERIMENTAL]** Specialized TabularPredictor subclass focused on interpretable models with simple, human-readable rules. Trades accuracy for interpretability by limiting to simple models and disabling complex ensemble techniques.

375

376

```python { .api }

377

class InterpretableTabularPredictor(TabularPredictor):

378

def __init__(self, *args, **kwargs):

379

"""

380

Initialize InterpretableTabularPredictor with same parameters as TabularPredictor.

381

Automatically restricts to interpretable models and preprocessing.

382

"""

383

384

def fit(

385

self,

386

train_data,

387

tuning_data=None,

388

time_limit: float = None,

389

*,

390

presets: str = "interpretable",

391

**kwargs

392

):

393

"""

394

Fit interpretable models with automatic preset selection for interpretability.

395

396

Parameters:

397

- train_data: Training data (same as TabularPredictor)

398

- tuning_data: Validation data (optional)

399

- time_limit: Maximum training time

400

- presets: Defaults to "interpretable" preset

401

402

Note: Bagging, stacking, and complex ensembles are disabled for interpretability

403

"""

404

405

def leaderboard_interpretable(self, verbose: bool = False, **kwargs):

406

"""

407

Leaderboard with model complexity scores for interpretable model selection.

408

409

Parameters:

410

- verbose: Print detailed leaderboard

411

412

Returns:

413

DataFrame: Leaderboard with additional 'complexity' column showing rule count

414

"""

415

416

def print_interpretable_rules(

417

self,

418

complexity_threshold: int = 10,

419

model_name: str = None

420

):

421

"""

422

Print human-readable rules from the best interpretable model.

423

424

Parameters:

425

- complexity_threshold: Maximum rule complexity to display

426

- model_name: Specific model to show rules for

427

"""

428

```

429

430

## Usage Examples

431

432

### Basic Classification

433

434

```python

435

from autogluon.tabular import TabularPredictor

436

437

# Binary classification

438

predictor = TabularPredictor(label='target')

439

predictor.fit('train.csv', presets='best_quality', time_limit=3600)

440

441

# Make predictions

442

predictions = predictor.predict('test.csv')

443

probabilities = predictor.predict_proba('test.csv')

444

445

# Evaluate performance

446

scores = predictor.evaluate('test.csv')

447

print(f"Accuracy: {scores['accuracy']:.3f}")

448

449

# View model leaderboard

450

leaderboard = predictor.leaderboard('test.csv')

451

print(leaderboard)

452

```

453

454

### Custom Configuration

455

456

```python

457

# Custom hyperparameters and model selection

458

hyperparameters = {

459

'GBM': {'num_boost_round': 1000, 'learning_rate': 0.01},

460

'RF': {'n_estimators': 500, 'max_depth': 20},

461

'XGB': {'n_estimators': 1000, 'learning_rate': 0.01}

462

}

463

464

predictor = TabularPredictor(

465

label='price',

466

problem_type='regression',

467

eval_metric='rmse',

468

path='./models'

469

)

470

471

predictor.fit(

472

train_data,

473

hyperparameters=hyperparameters,

474

excluded_model_types=['KNN', 'LR'], # Exclude certain model types

475

time_limit=7200,

476

presets='high_quality'

477

)

478

479

# Feature importance analysis

480

importance = predictor.feature_importance(train_data)

481

print(importance.head(10))

482

```