or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

configurations.mdexperimental.mdindex.mdmodels.mdpredictor.md

index.mddocs/

0

# AutoGluon Tabular

1

2

AutoGluon Tabular is a comprehensive automated machine learning library designed for tabular data analysis that enables developers to build high-accuracy predictive models for classification and regression tasks with minimal code. The package provides the TabularPredictor class which automatically handles feature engineering, model selection, hyperparameter optimization, and ensemble creation across a wide range of algorithms including gradient boosting (LightGBM, XGBoost, CatBoost), neural networks (FastAI, TabPFN), and traditional machine learning models.

3

4

## Package Information

5

6

- **Package Name**: autogluon.tabular

7

- **Language**: Python

8

- **Installation**: `pip install autogluon.tabular`

9

10

## Core Imports

11

12

```python

13

from autogluon.tabular import TabularPredictor

14

```

15

16

For advanced usage:

17

18

```python

19

from autogluon.tabular import TabularPredictor, TabularDataset, FeatureMetadata

20

```

21

22

## Basic Usage

23

24

```python

25

from autogluon.tabular import TabularPredictor

26

import pandas as pd

27

28

# Load your data

29

train_data = pd.read_csv('train.csv')

30

test_data = pd.read_csv('test.csv')

31

32

# Create and train predictor

33

predictor = TabularPredictor(label='target_column')

34

predictor.fit(train_data)

35

36

# Make predictions

37

predictions = predictor.predict(test_data)

38

probabilities = predictor.predict_proba(test_data)

39

40

# Evaluate performance

41

performance = predictor.evaluate(test_data)

42

leaderboard = predictor.leaderboard(test_data)

43

```

44

45

## Architecture

46

47

AutoGluon Tabular uses a multi-layered architecture for automated machine learning:

48

49

- **TabularPredictor**: High-level interface managing the complete ML pipeline

50

- **Learner**: Coordinates model training, bagging, and stacking strategies

51

- **Trainer**: Handles individual model training and ensemble creation

52

- **Models**: Extensive collection of ML algorithms with unified interfaces

53

- **Feature Processing**: Automatic feature engineering and data preprocessing

54

- **Evaluation**: Comprehensive model evaluation and selection framework

55

56

This design enables AutoGluon to automatically handle complex ML workflows while providing flexibility for advanced users to customize components and strategies.

57

58

## Capabilities

59

60

### Core Prediction Interface

61

62

The primary TabularPredictor class provides automated machine learning capabilities including model training, prediction, evaluation, and model management with minimal code required.

63

64

```python { .api }

65

class TabularPredictor:

66

def __init__(

67

self,

68

label: str,

69

problem_type: str = None,

70

eval_metric: str = None,

71

path: str = None,

72

verbosity: int = 2,

73

sample_weight: str = None,

74

weight_evaluation: bool = False,

75

groups: str = None,

76

positive_class: str | int = None,

77

**kwargs

78

): ...

79

80

def fit(

81

self,

82

train_data: pd.DataFrame,

83

tuning_data: pd.DataFrame = None,

84

time_limit: int = None,

85

presets: str = None,

86

hyperparameters: dict = None,

87

**kwargs

88

) -> 'TabularPredictor': ...

89

90

def predict(

91

self,

92

data: pd.DataFrame | str,

93

model: str = None,

94

as_pandas: bool = True,

95

transform_features: bool = True,

96

**kwargs

97

) -> pd.Series | np.ndarray: ...

98

99

def predict_proba(

100

self,

101

data: pd.DataFrame | str,

102

model: str = None,

103

as_pandas: bool = True,

104

as_multiclass: bool = True,

105

**kwargs

106

) -> pd.DataFrame | np.ndarray: ...

107

108

def evaluate(

109

self,

110

data: pd.DataFrame | str,

111

model: str = None,

112

**kwargs

113

) -> dict: ...

114

115

def leaderboard(

116

self,

117

data: pd.DataFrame | str = None,

118

extra_info: bool = False,

119

**kwargs

120

) -> pd.DataFrame: ...

121

122

class InterpretableTabularPredictor(TabularPredictor):

123

"""

124

Experimental predictor limited to interpretable models only.

125

Same interface as TabularPredictor but restricted to simple models.

126

"""

127

```

128

129

[Core Prediction Interface](./predictor.md)

130

131

### Experimental Scikit-learn Compatible Interfaces

132

133

Scikit-learn compatible wrappers providing familiar fit/predict interfaces for integration with existing scikit-learn workflows and pipelines.

134

135

```python { .api }

136

class TabularClassifier:

137

def fit(self, X: pd.DataFrame, y: pd.Series, **kwargs): ...

138

def predict(self, X: pd.DataFrame) -> np.ndarray: ...

139

def predict_proba(self, X: pd.DataFrame) -> np.ndarray: ...

140

def score(self, X: pd.DataFrame, y: pd.Series) -> float: ...

141

142

class TabularRegressor:

143

def fit(self, X: pd.DataFrame, y: pd.Series, **kwargs): ...

144

def predict(self, X: pd.DataFrame) -> np.ndarray: ...

145

def score(self, X: pd.DataFrame, y: pd.Series) -> float: ...

146

```

147

148

[Experimental Interfaces](./experimental.md)

149

150

### Model Collection and Registry

151

152

Comprehensive collection of machine learning models with unified interfaces, model registry for extensibility, and access to 30+ different algorithms from traditional ML to deep learning approaches.

153

154

```python { .api }

155

# Core Models

156

class LGBModel: ... # LightGBM gradient boosting

157

class XGBoostModel: ... # XGBoost gradient boosting

158

class CatBoostModel: ... # CatBoost gradient boosting

159

class RFModel: ... # Random Forest

160

class LinearModel: ... # Linear/Logistic Regression

161

class KNNModel: ... # K-Nearest Neighbors

162

163

# Neural Network Models

164

class NNFastAiTabularModel: ... # FastAI neural networks

165

class TabularNeuralNetTorchModel: ... # PyTorch neural networks

166

class TabPFNV2Model: ... # TabPFN v2

167

class FTTransformerModel: ... # Feature Tokenizer Transformer

168

169

# Model Registry

170

class ModelRegistry:

171

def register_model(self, name: str, model_class: type): ...

172

def get_model(self, name: str) -> type: ...

173

174

ag_model_registry: ModelRegistry

175

```

176

177

[Models and Registry](./models.md)

178

179

### Configuration and Presets

180

181

Pre-configured settings for different use cases, hyperparameter configuration system, and extensive customization options for advanced users.

182

183

```python { .api }

184

# Available presets

185

PRESETS = [

186

"best_quality", # Maximum accuracy, longer training

187

"high_quality", # High accuracy with fast inference

188

"good_quality", # Good accuracy with very fast inference

189

"medium_quality", # Medium accuracy, very fast training (default)

190

"optimize_for_deployment", # Optimizes for deployment by cleaning up models

191

"interpretable" # Interpretable models only

192

]

193

194

# Hyperparameter configuration functions

195

def get_hyperparameter_config(preset: str) -> dict: ...

196

def get_default_feature_generator(preset: str = "auto"): ...

197

```

198

199

[Configuration and Presets](./configurations.md)

200

201

## Types

202

203

```python { .api }

204

# Core data structures

205

TabularDataset = pd.DataFrame # Enhanced DataFrame for tabular data

206

207

class FeatureMetadata:

208

"""Metadata container for feature information"""

209

def __init__(self, type_map_raw: dict = None): ...

210

def get_features(self, valid_raw_types: list = None) -> list: ...

211

def get_feature_type_raw(self, feature: str) -> str: ...

212

213

# Problem types

214

PROBLEM_TYPES = Literal["binary", "multiclass", "regression", "quantile", "softclass"]

215

216

# Evaluation metrics

217

CLASSIFICATION_METRICS = [

218

"accuracy", "balanced_accuracy", "log_loss", "f1", "f1_macro", "f1_micro",

219

"f1_weighted", "roc_auc", "roc_auc_ovo", "roc_auc_ovr", "precision",

220

"precision_macro", "recall", "recall_macro", "mcc", "pac_score"

221

]

222

223

REGRESSION_METRICS = [

224

"root_mean_squared_error", "mean_squared_error", "mean_absolute_error",

225

"median_absolute_error", "mean_absolute_percentage_error", "r2",

226

"symmetric_mean_absolute_percentage_error"

227

]

228

229

# Weight strategies

230

WEIGHT_STRATEGIES = Literal["auto_weight", "balance_weight"]

231

```