or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

configurations.mdexperimental.mdindex.mdmodels.mdpredictor.md

models.mddocs/

0

# Models and Registry

1

2

AutoGluon Tabular provides a comprehensive collection of machine learning models with unified interfaces, spanning from traditional algorithms to modern deep learning approaches. The model registry system enables extensibility and customization of the available model portfolio.

3

4

## Capabilities

5

6

### Core Machine Learning Models

7

8

Traditional and gradient boosting models that form the backbone of AutoGluon's automated machine learning capabilities, providing robust performance across diverse tabular datasets.

9

10

```python { .api }

11

# Gradient Boosting Models

12

class LGBModel:

13

"""LightGBM gradient boosting model optimized for speed and memory efficiency."""

14

15

class XGBoostModel:

16

"""XGBoost gradient boosting model with advanced regularization and handling of missing values."""

17

18

class CatBoostModel:

19

"""CatBoost gradient boosting model with native categorical feature support."""

20

21

# Tree-based Models

22

class RFModel:

23

"""Random Forest model providing ensemble of decision trees with feature bagging."""

24

25

class XTModel:

26

"""Extra Trees (Extremely Randomized Trees) model with increased randomization."""

27

28

# Linear Models

29

class LinearModel:

30

"""Linear/Logistic Regression with automatic regularization and feature scaling."""

31

32

# Instance-based Models

33

class KNNModel:

34

"""K-Nearest Neighbors model for both classification and regression tasks."""

35

```

36

37

### Neural Network Models

38

39

Deep learning models optimized for tabular data with automatic architecture selection, hyperparameter optimization, and specialized architectures for structured data.

40

41

```python { .api }

42

# Traditional Neural Networks

43

class NNFastAiTabularModel:

44

"""FastAI-based neural network with automated preprocessing and training."""

45

46

class TabularNeuralNetTorchModel:

47

"""PyTorch-based neural network with custom architecture for tabular data."""

48

49

# Transformer-based Models

50

class FTTransformerModel:

51

"""Feature Tokenizer Transformer - specialized transformer architecture for tabular data."""

52

53

# Pre-trained Foundation Models

54

class TabPFNV2Model:

55

"""TabPFN v2 - pre-trained transformer model fine-tuned for tabular prediction."""

56

57

class TabPFNMixModel:

58

"""TabPFN Mix - ensemble of pre-trained transformers for improved performance."""

59

60

class MitraModel:

61

"""Mitra - advanced transformer architecture optimized for tabular classification."""

62

63

# Specialized Neural Networks

64

class TabMModel:

65

"""TabM - neural network with attention mechanisms for tabular data."""

66

67

class RealMLPModel:

68

"""Real-valued MLP with specialized training procedures for tabular prediction."""

69

70

class TabICLModel:

71

"""TabICL - in-context learning model for few-shot tabular prediction."""

72

```

73

74

### Multi-Modal Models

75

76

Models capable of handling mixed data types including text, images, and structured features within the same prediction task.

77

78

```python { .api }

79

class MultiModalPredictorModel:

80

"""

81

AutoMM-based multi-modal model handling tabular, text, and image features.

82

Automatically detects and processes different data modalities.

83

"""

84

85

class TextPredictorModel:

86

"""Specialized model for tabular data containing text features."""

87

88

class FastTextModel:

89

"""FastText model for efficient text classification and representation learning."""

90

91

class ImagePredictorModel:

92

"""Model for tabular data with image features or image-based prediction."""

93

```

94

95

### Interpretable Models

96

97

Models designed for interpretability and explainability, providing transparent decision-making processes suitable for regulated industries and high-stakes applications.

98

99

```python { .api }

100

# Base Interpretable Model

101

class _IModelsModel:

102

"""Base class for interpretable machine learning models."""

103

104

# Rule-based Models

105

class BoostedRulesModel:

106

"""Gradient-boosted rule ensemble providing interpretable decision rules."""

107

108

class RuleFitModel:

109

"""RuleFit model combining linear regression with decision rules."""

110

111

class FigsModel:

112

"""FIGS (Fast Interpretable Greedy-tree Sums) model for rule-based predictions."""

113

114

# Tree-based Interpretable Models

115

class GreedyTreeModel:

116

"""Greedy decision tree optimized for interpretability over accuracy."""

117

118

class HSTreeModel:

119

"""Hierarchical Shrinkage Tree with built-in regularization."""

120

121

# Text Models

122

class FastTextModel:

123

"""FastText model for text classification in tabular datasets."""

124

```

125

126

### Model Registry System

127

128

Extensible registry system for managing, registering, and accessing machine learning models within AutoGluon's framework.

129

130

```python { .api }

131

class ModelRegistry:

132

"""

133

Registry for managing available machine learning models.

134

Enables custom model registration and retrieval.

135

"""

136

137

def __init__(self):

138

"""Initialize empty model registry."""

139

140

def register_model(

141

self,

142

name: str,

143

model_class: type,

144

tags: list[str] = None

145

) -> None:

146

"""

147

Register a new model class in the registry.

148

149

Parameters:

150

- name: Unique identifier for the model

151

- model_class: Model class to register

152

- tags: Optional tags for categorization

153

"""

154

155

def get_model(self, name: str) -> type:

156

"""

157

Retrieve a registered model class by name.

158

159

Parameters:

160

- name: Name of the registered model

161

162

Returns:

163

Model class

164

"""

165

166

def list_models(self, tags: list[str] = None) -> list[str]:

167

"""

168

List all registered model names.

169

170

Parameters:

171

- tags: Filter by tags (optional)

172

173

Returns:

174

List of model names

175

"""

176

177

def unregister_model(self, name: str) -> None:

178

"""

179

Remove a model from the registry.

180

181

Parameters:

182

- name: Name of the model to remove

183

"""

184

185

# Global model registry instance

186

ag_model_registry: ModelRegistry

187

```

188

189

### Base Model Interface

190

191

Abstract base class defining the common interface that all AutoGluon models must implement for consistent behavior and integration.

192

193

```python { .api }

194

class AbstractModel:

195

"""

196

Abstract base class for all AutoGluon tabular models.

197

Defines the standard interface and common functionality.

198

"""

199

200

def __init__(

201

self,

202

problem_type: str,

203

objective: str = None,

204

**kwargs

205

):

206

"""

207

Initialize model with problem configuration.

208

209

Parameters:

210

- problem_type: Type of ML problem ('binary', 'multiclass', 'regression')

211

- objective: Optimization objective/metric

212

- kwargs: Model-specific parameters

213

"""

214

215

def fit(

216

self,

217

X_train: pd.DataFrame,

218

y_train: pd.Series,

219

X_val: pd.DataFrame = None,

220

y_val: pd.Series = None,

221

**kwargs

222

) -> None:

223

"""

224

Train the model on provided data.

225

226

Parameters:

227

- X_train: Training features

228

- y_train: Training labels

229

- X_val: Validation features (optional)

230

- y_val: Validation labels (optional)

231

"""

232

233

def predict(self, X: pd.DataFrame, **kwargs) -> np.ndarray:

234

"""

235

Generate predictions for input data.

236

237

Parameters:

238

- X: Input features

239

240

Returns:

241

Predictions as numpy array

242

"""

243

244

def predict_proba(self, X: pd.DataFrame, **kwargs) -> np.ndarray:

245

"""

246

Generate prediction probabilities (classification only).

247

248

Parameters:

249

- X: Input features

250

251

Returns:

252

Prediction probabilities as numpy array

253

"""

254

255

def get_memory_size(self) -> int:

256

"""

257

Get approximate memory usage of the model in bytes.

258

259

Returns:

260

Memory usage in bytes

261

"""

262

263

def save(self, path: str) -> None:

264

"""

265

Save model to disk.

266

267

Parameters:

268

- path: File path for saving

269

"""

270

271

def load(self, path: str) -> None:

272

"""

273

Load model from disk.

274

275

Parameters:

276

- path: File path for loading

277

"""

278

```

279

280

## Usage Examples

281

282

### Custom Model Registration

283

284

```python

285

from autogluon.tabular.models import AbstractModel

286

from autogluon.tabular.registry import ag_model_registry

287

import pandas as pd

288

import numpy as np

289

from sklearn.ensemble import GradientBoostingClassifier

290

291

class CustomGBModel(AbstractModel):

292

"""Custom Gradient Boosting model wrapper."""

293

294

def __init__(self, **kwargs):

295

super().__init__(**kwargs)

296

self.model = GradientBoostingClassifier(

297

n_estimators=kwargs.get('n_estimators', 100),

298

learning_rate=kwargs.get('learning_rate', 0.1),

299

random_state=42

300

)

301

302

def fit(self, X_train, y_train, **kwargs):

303

self.model.fit(X_train, y_train)

304

305

def predict(self, X):

306

return self.model.predict(X)

307

308

def predict_proba(self, X):

309

return self.model.predict_proba(X)

310

311

# Register custom model

312

ag_model_registry.register_model(

313

name='CustomGB',

314

model_class=CustomGBModel,

315

tags=['tree', 'gradient_boosting', 'custom']

316

)

317

318

# Use in TabularPredictor

319

from autogluon.tabular import TabularPredictor

320

321

predictor = TabularPredictor(label='target')

322

predictor.fit(

323

train_data,

324

hyperparameters={'CustomGB': {'n_estimators': [50, 100, 200]}}

325

)

326

```

327

328

### Model-Specific Hyperparameter Tuning

329

330

```python

331

from autogluon.tabular import TabularPredictor

332

333

# Define model-specific hyperparameters

334

hyperparameters = {

335

# LightGBM configurations

336

'LGB': {

337

'num_leaves': [31, 127, 255],

338

'learning_rate': [0.01, 0.05, 0.1],

339

'feature_fraction': [0.8, 0.9, 1.0],

340

'bagging_fraction': [0.8, 0.9, 1.0],

341

'min_data_in_leaf': [10, 20, 50]

342

},

343

344

# XGBoost configurations

345

'XGB': {

346

'n_estimators': [100, 300, 500],

347

'max_depth': [3, 6, 10],

348

'learning_rate': [0.01, 0.1, 0.2],

349

'subsample': [0.8, 0.9, 1.0],

350

'colsample_bytree': [0.8, 0.9, 1.0]

351

},

352

353

# CatBoost configurations

354

'CAT': {

355

'iterations': [100, 500, 1000],

356

'depth': [4, 6, 8],

357

'learning_rate': [0.01, 0.1, 0.2],

358

'l2_leaf_reg': [1, 3, 5, 7, 9]

359

},

360

361

# Neural Network configurations

362

'NN_TORCH': {

363

'num_epochs': [10, 50, 100],

364

'learning_rate': [1e-4, 1e-3, 1e-2],

365

'weight_decay': [1e-6, 1e-4, 1e-2],

366

'dropout_prob': [0.0, 0.1, 0.2, 0.5]

367

}

368

}

369

370

predictor = TabularPredictor(label='target')

371

predictor.fit(

372

train_data,

373

hyperparameters=hyperparameters,

374

time_limit=1800 # 30 minutes

375

)

376

377

# Check which models were trained

378

leaderboard = predictor.leaderboard()

379

print("Trained models:")

380

print(leaderboard[['model', 'score_val']].head(10))

381

```

382

383

### Model Selection and Filtering

384

385

```python

386

from autogluon.tabular import TabularPredictor

387

388

# Include only specific model types

389

predictor = TabularPredictor(label='target')

390

predictor.fit(

391

train_data,

392

included_model_types=['LGB', 'XGB', 'CAT'], # Only gradient boosting

393

time_limit=600

394

)

395

396

# Exclude interpretable models for best performance

397

predictor_performance = TabularPredictor(label='target')

398

predictor_performance.fit(

399

train_data,

400

excluded_model_types=['LR', 'KNN'], # Exclude simpler models

401

presets='best_quality'

402

)

403

404

# Include only interpretable models

405

predictor_interpretable = TabularPredictor(label='target')

406

predictor_interpretable.fit(

407

train_data,

408

included_model_types=['LR', 'RF', 'XGB'], # More interpretable options

409

presets='interpretable'

410

)

411

```

412

413

### Advanced Model Configuration

414

415

```python

416

from autogluon.tabular import TabularPredictor

417

418

# Advanced configuration with model-specific arguments

419

ag_args_fit = {

420

'num_cpus': 8, # CPU cores for training

421

'num_gpus': 1, # GPU devices

422

'memory_limit': 16000, # Memory limit in MB

423

}

424

425

ag_args_ensemble = {

426

'fold_fitting_strategy': 'sequential_local',

427

'auto_stack': True,

428

'bagging_mode': 'oob', # Out-of-bag validation

429

}

430

431

predictor = TabularPredictor(

432

label='target',

433

eval_metric='roc_auc'

434

)

435

436

predictor.fit(

437

train_data,

438

time_limit=3600, # 1 hour

439

presets='best_quality',

440

num_bag_folds=10,

441

num_stack_levels=3,

442

ag_args_fit=ag_args_fit,

443

ag_args_ensemble=ag_args_ensemble,

444

445

# Model-specific advanced arguments

446

hyperparameters={

447

'LGB': {'ag_args': {'name_suffix': '_Large', 'priority': 1}},

448

'XGB': {'ag_args': {'name_suffix': '_XL', 'priority': 2}},

449

'CAT': {'ag_args': {'name_suffix': '_Balanced', 'priority': 3}}

450

}

451

)

452

453

# Analyze model performance and resource usage

454

leaderboard = predictor.leaderboard(extra_info=True)

455

print(leaderboard[['model', 'score_val', 'fit_time', 'pred_time_val']].head())

456

```

457

458

### Working with Model Registry

459

460

```python

461

from autogluon.tabular.registry import ag_model_registry

462

463

# List all available models

464

all_models = ag_model_registry.list_models()

465

print(f"Available models: {len(all_models)}")

466

print(all_models[:10]) # First 10 models

467

468

# Get specific model class

469

lgb_class = ag_model_registry.get_model('LGBModel')

470

print(f"LightGBM model class: {lgb_class}")

471

472

# Check if model is registered

473

if 'XGBModel' in all_models:

474

xgb_class = ag_model_registry.get_model('XGBModel')

475

print(f"XGBoost available: {xgb_class is not None}")

476

477

# Custom model usage

478

from autogluon.tabular.models import RFModel

479

480

# Instantiate model directly (advanced usage)

481

rf_model = RFModel(

482

problem_type='binary',

483

objective='binary_logloss'

484

)

485

486

# This would typically be done within TabularPredictor

487

# rf_model.fit(X_train, y_train)

488

# predictions = rf_model.predict(X_test)

489

```