or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

configurations.mdexperimental.mdindex.mdmodels.mdpredictor.md

configurations.mddocs/

0

# Configuration and Presets

1

2

AutoGluon Tabular provides extensive configuration options through presets, hyperparameter configurations, and feature processing settings. These configurations enable users to optimize for different objectives like accuracy, speed, interpretability, or deployment constraints.

3

4

## Capabilities

5

6

### Preset Configurations

7

8

Pre-configured settings optimized for different use cases, balancing accuracy, training time, and computational resources.

9

10

```python { .api }

11

# Available preset configurations

12

PRESET_CONFIGURATIONS = Literal[

13

"best_quality", # Maximum accuracy, longer training time

14

"high_quality", # High accuracy with fast inference

15

"good_quality", # Good accuracy with very fast inference

16

"medium_quality", # Medium accuracy, very fast training (default)

17

"optimize_for_deployment", # Optimizes for deployment by cleaning up models

18

"interpretable" # Interpretable models only

19

]

20

21

def get_preset_config(preset: str) -> dict:

22

"""

23

Get configuration dictionary for a specific preset.

24

25

Parameters:

26

- preset: Name of the preset configuration

27

28

Returns:

29

Dictionary with preset configuration parameters

30

"""

31

```

32

33

### Hyperparameter Configurations

34

35

Systematic hyperparameter configuration system for customizing model training and optimization strategies.

36

37

```python { .api }

38

def get_hyperparameter_config(

39

preset: str = None,

40

model_types: list[str] = None,

41

search_strategy: str = "auto"

42

) -> dict:

43

"""

44

Generate hyperparameter configuration for specified models and preset.

45

46

Parameters:

47

- preset: Base preset configuration

48

- model_types: List of model types to configure

49

- search_strategy: Hyperparameter search strategy ('grid', 'random', 'bayesian', 'auto')

50

51

Returns:

52

Dictionary mapping model names to hyperparameter configurations

53

"""

54

55

# Hyperparameter configuration structure

56

HYPERPARAMETER_CONFIG = dict[str, dict[str, Any]]

57

# Example: {'LGB': {'num_leaves': [31, 127], 'learning_rate': [0.01, 0.1]}}

58

59

def get_hyperparameter_config_options() -> list[str]:

60

"""

61

Get list of available hyperparameter configuration presets.

62

63

Returns:

64

List of available configuration names

65

"""

66

67

def get_hyperparameter_config(config_name: str) -> dict:

68

"""

69

Get specific hyperparameter configuration by name.

70

71

Parameters:

72

- config_name: Name of the hyperparameter configuration preset

73

74

Returns:

75

Hyperparameter configuration dictionary

76

"""

77

```

78

79

### Feature Generation Configuration

80

81

Automated feature engineering and preprocessing configuration system for handling diverse data types and feature transformations.

82

83

```python { .api }

84

def get_default_feature_generator(

85

feature_generator: str = "auto",

86

feature_metadata: 'FeatureMetadata' = None,

87

init_kwargs: dict = None

88

) -> 'AutoMLPipelineFeatureGenerator':

89

"""

90

Get default feature generator with specified configuration.

91

92

Parameters:

93

- feature_generator: Feature generation preset ('auto', 'interpretable')

94

- feature_metadata: Metadata for feature processing

95

- init_kwargs: Additional initialization arguments

96

97

Returns:

98

Configured feature generator instance

99

"""

100

101

class FeatureGenerator:

102

"""Base class for feature generation and preprocessing."""

103

104

def fit_transform(

105

self,

106

X: pd.DataFrame,

107

feature_metadata: 'FeatureMetadata' = None,

108

**kwargs

109

) -> pd.DataFrame:

110

"""

111

Fit feature generator and transform input data.

112

113

Parameters:

114

- X: Input dataframe

115

- feature_metadata: Feature type metadata

116

117

Returns:

118

Transformed feature dataframe

119

"""

120

121

def transform(self, X: pd.DataFrame) -> pd.DataFrame:

122

"""Transform input data using fitted generator."""

123

```

124

125

### Advanced Training Arguments

126

127

Configuration options for advanced training strategies including bagging, stacking, and resource management.

128

129

```python { .api }

130

class AGArgsFit:

131

"""Arguments for controlling model fitting behavior."""

132

133

num_cpus: int = "auto" # CPU cores for training

134

num_gpus: int = 0 # GPU devices to use

135

memory_limit: int = None # Memory limit in MB

136

disk_limit: int = None # Disk space limit in MB

137

time_limit: float = None # Time limit per model in seconds

138

name_suffix: str = "" # Suffix for model names

139

priority: int = 0 # Training priority

140

141

class AGArgsEnsemble:

142

"""Arguments for controlling ensemble behavior."""

143

144

fold_fitting_strategy: str = "sequential_local" # Fold fitting strategy

145

auto_stack: bool = True # Enable automatic stacking

146

bagging_mode: str = "oob" # Bagging validation mode

147

stack_mode: str = "infer" # Stacking mode

148

ensemble_size_max: int = 25 # Maximum ensemble size

149

150

# Training configuration structure

151

TRAINING_CONFIG = {

152

'num_bag_folds': int, # Number of bagging folds (default: auto)

153

'num_bag_sets': int, # Number of bagging sets (default: auto)

154

'num_stack_levels': int, # Number of stacking levels (default: auto)

155

'ag_args_fit': dict, # Advanced fitting arguments

156

'ag_args_ensemble': dict, # Advanced ensemble arguments

157

}

158

```

159

160

### Evaluation and Metric Configuration

161

162

Configuration for evaluation metrics, validation strategies, and performance measurement.

163

164

```python { .api }

165

# Classification metrics

166

CLASSIFICATION_METRICS = [

167

"accuracy", "balanced_accuracy", "log_loss",

168

"f1", "f1_macro", "f1_micro", "f1_weighted",

169

"roc_auc", "roc_auc_ovo", "roc_auc_ovo_macro", "roc_auc_ovo_weighted",

170

"roc_auc_ovr", "roc_auc_ovr_macro", "roc_auc_ovr_micro", "roc_auc_ovr_weighted",

171

"average_precision", "precision", "precision_macro", "precision_micro", "precision_weighted",

172

"recall", "recall_macro", "recall_micro", "recall_weighted",

173

"mcc", "pac_score"

174

]

175

176

# Regression metrics

177

REGRESSION_METRICS = [

178

"root_mean_squared_error", "mean_squared_error", "mean_absolute_error",

179

"median_absolute_error", "mean_absolute_percentage_error",

180

"r2", "symmetric_mean_absolute_percentage_error"

181

]

182

183

# Quantile regression metrics

184

QUANTILE_METRICS = ["pinball_loss"]

185

186

def get_metric_config(

187

problem_type: str,

188

eval_metric: str = None,

189

greater_is_better: bool = None

190

) -> dict:

191

"""

192

Get metric configuration for evaluation.

193

194

Parameters:

195

- problem_type: Type of ML problem

196

- eval_metric: Primary evaluation metric

197

- greater_is_better: Whether higher metric values are better

198

199

Returns:

200

Metric configuration dictionary

201

"""

202

```

203

204

### Resource and Performance Configuration

205

206

Settings for optimizing computational resource usage, memory management, and training performance.

207

208

```python { .api }

209

class ResourceConfig:

210

"""Configuration for computational resources and performance optimization."""

211

212

# CPU and Memory

213

num_cpus: int = "auto" # Number of CPU cores

214

memory_limit_mb: int = None # Memory limit in megabytes

215

216

# GPU Configuration

217

num_gpus: int = 0 # Number of GPU devices

218

gpu_memory_limit: int = None # GPU memory limit

219

220

# Disk and Storage

221

disk_limit_mb: int = None # Disk space limit

222

cache_data: bool = True # Cache preprocessed data

223

224

# Performance Optimization

225

enable_multiprocessing: bool = True # Enable multiprocessing

226

max_concurrent_models: int = 1 # Maximum concurrent model training

227

early_stopping_rounds: int = None # Early stopping configuration

228

229

# Inference Optimization

230

optimize_for_deployment: bool = False # Optimize for deployment

231

model_compression: bool = False # Enable model compression

232

```

233

234

## Usage Examples

235

236

### Basic Preset Usage

237

238

```python

239

from autogluon.tabular import TabularPredictor

240

import pandas as pd

241

242

# Load data

243

train_data = pd.read_csv('train.csv')

244

test_data = pd.read_csv('test.csv')

245

246

# Different preset configurations

247

presets = ['good_quality', 'best_quality', 'optimize_for_deployment', 'interpretable']

248

249

results = {}

250

for preset in presets:

251

print(f"\nTraining with preset: {preset}")

252

253

predictor = TabularPredictor(

254

label='target',

255

path=f'./models_{preset}/'

256

)

257

258

predictor.fit(

259

train_data,

260

presets=preset,

261

time_limit=600 # 10 minutes per preset

262

)

263

264

# Evaluate performance

265

performance = predictor.evaluate(test_data)

266

leaderboard = predictor.leaderboard(test_data)

267

268

results[preset] = {

269

'score': performance,

270

'best_model': leaderboard.iloc[0]['model'],

271

'num_models': len(leaderboard)

272

}

273

274

print(f"Best score: {performance}")

275

print(f"Best model: {results[preset]['best_model']}")

276

print(f"Total models trained: {results[preset]['num_models']}")

277

278

# Compare results

279

print("\nPreset Comparison:")

280

for preset, result in results.items():

281

print(f"{preset}: {result['score']:.4f} ({result['num_models']} models)")

282

```

283

284

### Custom Hyperparameter Configuration

285

286

```python

287

from autogluon.tabular import TabularPredictor

288

289

# Advanced hyperparameter configuration

290

hyperparameters = {

291

# Gradient Boosting Models

292

'LGB': [

293

# Fast configuration

294

{

295

'num_leaves': 31,

296

'learning_rate': 0.1,

297

'feature_fraction': 0.9,

298

'bagging_fraction': 0.8,

299

'bagging_freq': 5,

300

'min_data_in_leaf': 20,

301

'objective': 'binary',

302

'max_depth': -1,

303

'save_binary': True,

304

'ag_args': {'name_suffix': '_Fast', 'priority': 1}

305

},

306

# Accurate configuration

307

{

308

'num_leaves': 127,

309

'learning_rate': 0.05,

310

'feature_fraction': 0.8,

311

'bagging_fraction': 0.9,

312

'bagging_freq': 5,

313

'min_data_in_leaf': 10,

314

'reg_alpha': 0.1,

315

'reg_lambda': 0.1,

316

'ag_args': {'name_suffix': '_Accurate', 'priority': 2}

317

}

318

],

319

320

'XGB': {

321

'n_estimators': [100, 300, 500],

322

'max_depth': [3, 6, 10],

323

'learning_rate': [0.01, 0.1, 0.2],

324

'subsample': [0.8, 0.9, 1.0],

325

'colsample_bytree': [0.8, 0.9, 1.0],

326

'reg_alpha': [0, 0.1, 1],

327

'reg_lambda': [0, 0.1, 1]

328

},

329

330

# Neural Networks

331

'NN_TORCH': [

332

# Small network

333

{

334

'num_epochs': 50,

335

'learning_rate': 0.001,

336

'weight_decay': 1e-4,

337

'dropout_prob': 0.1,

338

'embedding_size_factor': 1.0,

339

'ag_args': {'name_suffix': '_Small'}

340

},

341

# Large network

342

{

343

'num_epochs': 100,

344

'learning_rate': 0.0005,

345

'weight_decay': 1e-5,

346

'dropout_prob': 0.2,

347

'embedding_size_factor': 2.0,

348

'ag_args': {'name_suffix': '_Large'}

349

}

350

]

351

}

352

353

# Train with custom hyperparameters

354

predictor = TabularPredictor(label='target')

355

predictor.fit(

356

train_data,

357

hyperparameters=hyperparameters,

358

time_limit=1800, # 30 minutes

359

num_bag_folds=5,

360

num_stack_levels=2

361

)

362

```

363

364

### Advanced Training Configuration

365

366

```python

367

from autogluon.tabular import TabularPredictor

368

369

# Advanced training arguments

370

ag_args_fit = {

371

'num_cpus': 8, # Use 8 CPU cores

372

'num_gpus': 1, # Use 1 GPU

373

'memory_limit': 16000, # 16GB memory limit

374

'time_limit': 300, # 5 minutes per model

375

}

376

377

ag_args_ensemble = {

378

'fold_fitting_strategy': 'sequential_local',

379

'auto_stack': True,

380

'bagging_mode': 'oob', # Out-of-bag validation

381

'stack_mode': 'infer',

382

'ensemble_size_max': 50 # Maximum ensemble size

383

}

384

385

# Feature generation configuration

386

feature_generator_kwargs = {

387

'enable_raw_text_features': True,

388

'enable_nlp_features': True,

389

'text_ngram_size': 300,

390

'text_special_features': ['word_count', 'char_count']

391

}

392

393

predictor = TabularPredictor(

394

label='target',

395

eval_metric='roc_auc',

396

sample_weight='sample_weights'

397

)

398

399

predictor.fit(

400

train_data,

401

tuning_data=validation_data,

402

time_limit=3600, # 1 hour total

403

presets='best_quality',

404

405

# Advanced configurations

406

ag_args_fit=ag_args_fit,

407

ag_args_ensemble=ag_args_ensemble,

408

feature_generator_kwargs=feature_generator_kwargs,

409

410

# Bagging and stacking

411

num_bag_folds=10,

412

num_bag_sets=3,

413

num_stack_levels=3,

414

415

# Model selection

416

excluded_model_types=['KNN'], # Exclude slow models

417

418

# Hyperparameter tuning

419

hyperparameter_tune_kwargs={

420

'scheduler': 'local',

421

'searcher': 'bayesopt',

422

'num_trials': 100

423

}

424

)

425

```

426

427

### Deployment Optimization Configuration

428

429

```python

430

from autogluon.tabular import TabularPredictor

431

432

# Configuration optimized for deployment

433

deployment_hyperparameters = {

434

'LGB': {

435

'num_leaves': 31, # Smaller trees

436

'max_depth': 6,

437

'min_data_in_leaf': 50, # Regularization

438

'bagging_freq': 0, # Disable bagging for speed

439

'feature_fraction': 1.0, # Use all features

440

},

441

'CAT': {

442

'iterations': 100, # Fewer iterations

443

'depth': 6,

444

'l2_leaf_reg': 3,

445

'bootstrap_type': 'No' # Disable bootstrap

446

}

447

}

448

449

predictor = TabularPredictor(

450

label='target',

451

path='./deployment_model/'

452

)

453

454

predictor.fit(

455

train_data,

456

presets='optimize_for_deployment',

457

hyperparameters=deployment_hyperparameters,

458

time_limit=300, # Fast training

459

num_bag_folds=0, # Disable bagging

460

num_stack_levels=0, # Disable stacking

461

462

# Focus on fast, simple models

463

included_model_types=['LGB', 'CAT', 'LR']

464

)

465

466

# Create deployment-optimized clone

467

deployment_predictor = predictor.clone_for_deployment(

468

path='./deployment_ready/',

469

model='best' # Single best model only

470

)

471

472

# Test inference speed

473

import time

474

start_time = time.time()

475

predictions = deployment_predictor.predict(test_data)

476

inference_time = time.time() - start_time

477

478

print(f"Inference time: {inference_time:.3f} seconds")

479

print(f"Predictions per second: {len(test_data) / inference_time:.0f}")

480

```

481

482

### Interpretable Model Configuration

483

484

```python

485

from autogluon.tabular import TabularPredictor

486

487

# Configuration for interpretable models

488

interpretable_hyperparameters = {

489

'LR': { # Logistic Regression

490

'C': [0.01, 0.1, 1.0, 10], # Regularization

491

'penalty': ['l1', 'l2'],

492

'solver': ['liblinear', 'saga']

493

},

494

'RF': { # Random Forest

495

'n_estimators': [50, 100, 200],

496

'max_depth': [3, 5, 10], # Limit depth for interpretability

497

'min_samples_split': [10, 20, 50],

498

'max_features': ['sqrt', 'log2']

499

},

500

'XGB': { # XGBoost (regularized)

501

'n_estimators': [50, 100],

502

'max_depth': [3, 4, 5], # Shallow trees

503

'learning_rate': [0.1, 0.2],

504

'reg_alpha': [0.1, 1.0], # L1 regularization

505

'reg_lambda': [0.1, 1.0] # L2 regularization

506

}

507

}

508

509

predictor = TabularPredictor(

510

label='target',

511

eval_metric='accuracy'

512

)

513

514

predictor.fit(

515

train_data,

516

presets='interpretable',

517

hyperparameters=interpretable_hyperparameters,

518

519

# Enable only interpretable models

520

included_model_types=['LR', 'RF', 'XGB'],

521

522

# Simpler ensemble strategies

523

num_bag_folds=3,

524

num_stack_levels=1,

525

526

# Feature processing for interpretability

527

feature_generator='auto' # Minimal feature engineering

528

)

529

530

# Analyze model interpretability

531

leaderboard = predictor.leaderboard(extra_info=True)

532

print("Interpretable models ranking:")

533

print(leaderboard[['model', 'score_val', 'fit_time']].head())

534

```

535

536

## Configuration Reference

537

538

### Preset Details

539

540

| Preset | Training Time | Model Diversity | Ensembling | Best For |

541

|--------|---------------|-----------------|------------|----------|

542

| `medium_quality` | Low | Medium | None | Quick prototyping, default preset |

543

| `good_quality` | Medium | High | Moderate | General use, balanced performance |

544

| `high_quality` | High | High | Extensive | High accuracy with fast inference |

545

| `best_quality` | Very High | Very High | Extensive | Maximum accuracy, competitions |

546

| `optimize_for_deployment` | - | - | - | Post-training optimization |

547

| `interpretable` | Low | Limited | Simple | Regulated industries, explainability |

548

549

### Model Type Abbreviations

550

551

| Code | Full Name | Category |

552

|------|-----------|----------|

553

| `LGB` | LightGBM | Gradient Boosting |

554

| `XGB` | XGBoost | Gradient Boosting |

555

| `CAT` | CatBoost | Gradient Boosting |

556

| `RF` | Random Forest | Tree Ensemble |

557

| `XT` | Extra Trees | Tree Ensemble |

558

| `LR` | Linear/Logistic Regression | Linear |

559

| `KNN` | K-Nearest Neighbors | Instance-based |

560

| `NN_TORCH` | PyTorch Neural Network | Deep Learning |

561

| `FASTAI` | FastAI Neural Network | Deep Learning |

562

| `TABPFN` | TabPFN | Foundation Model |

563

564

### Resource Configuration Guidelines

565

566

| Use Case | CPU Cores | Memory (GB) | Time Limit | Bag Folds |

567

|----------|-----------|-------------|------------|-----------|

568

| Quick Prototype | 2-4 | 4-8 | 5-15 min | 2-3 |

569

| Production Model | 8-16 | 16-32 | 30-60 min | 5-10 |

570

| Competition | 16-32 | 32-64 | 2-8 hours | 10-20 |

571

| Large Dataset | 16+ | 64+ | 4+ hours | 5-10 |