0
# AutoGluon Tabular
1
2
AutoGluon Tabular is a comprehensive automated machine learning library designed for tabular data analysis that enables developers to build high-accuracy predictive models for classification and regression tasks with minimal code. The package provides the TabularPredictor class which automatically handles feature engineering, model selection, hyperparameter optimization, and ensemble creation across a wide range of algorithms including gradient boosting (LightGBM, XGBoost, CatBoost), neural networks (FastAI, TabPFN), and traditional machine learning models.
3
4
## Package Information
5
6
- **Package Name**: autogluon.tabular
7
- **Language**: Python
8
- **Installation**: `pip install autogluon.tabular`
9
10
## Core Imports
11
12
```python
13
from autogluon.tabular import TabularPredictor
14
```
15
16
For advanced usage:
17
18
```python
19
from autogluon.tabular import TabularPredictor, TabularDataset, FeatureMetadata
20
```
21
22
## Basic Usage
23
24
```python
25
from autogluon.tabular import TabularPredictor
26
import pandas as pd
27
28
# Load your data
29
train_data = pd.read_csv('train.csv')
30
test_data = pd.read_csv('test.csv')
31
32
# Create and train predictor
33
predictor = TabularPredictor(label='target_column')
34
predictor.fit(train_data)
35
36
# Make predictions
37
predictions = predictor.predict(test_data)
38
probabilities = predictor.predict_proba(test_data)
39
40
# Evaluate performance
41
performance = predictor.evaluate(test_data)
42
leaderboard = predictor.leaderboard(test_data)
43
```
44
45
## Architecture
46
47
AutoGluon Tabular uses a multi-layered architecture for automated machine learning:
48
49
- **TabularPredictor**: High-level interface managing the complete ML pipeline
50
- **Learner**: Coordinates model training, bagging, and stacking strategies
51
- **Trainer**: Handles individual model training and ensemble creation
52
- **Models**: Extensive collection of ML algorithms with unified interfaces
53
- **Feature Processing**: Automatic feature engineering and data preprocessing
54
- **Evaluation**: Comprehensive model evaluation and selection framework
55
56
This design enables AutoGluon to automatically handle complex ML workflows while providing flexibility for advanced users to customize components and strategies.
57
58
## Capabilities
59
60
### Core Prediction Interface
61
62
The primary TabularPredictor class provides automated machine learning capabilities including model training, prediction, evaluation, and model management with minimal code required.
63
64
```python { .api }
65
class TabularPredictor:
66
def __init__(
67
self,
68
label: str,
69
problem_type: str = None,
70
eval_metric: str = None,
71
path: str = None,
72
verbosity: int = 2,
73
sample_weight: str = None,
74
weight_evaluation: bool = False,
75
groups: str = None,
76
positive_class: str | int = None,
77
**kwargs
78
): ...
79
80
def fit(
81
self,
82
train_data: pd.DataFrame,
83
tuning_data: pd.DataFrame = None,
84
time_limit: int = None,
85
presets: str = None,
86
hyperparameters: dict = None,
87
**kwargs
88
) -> 'TabularPredictor': ...
89
90
def predict(
91
self,
92
data: pd.DataFrame | str,
93
model: str = None,
94
as_pandas: bool = True,
95
transform_features: bool = True,
96
**kwargs
97
) -> pd.Series | np.ndarray: ...
98
99
def predict_proba(
100
self,
101
data: pd.DataFrame | str,
102
model: str = None,
103
as_pandas: bool = True,
104
as_multiclass: bool = True,
105
**kwargs
106
) -> pd.DataFrame | np.ndarray: ...
107
108
def evaluate(
109
self,
110
data: pd.DataFrame | str,
111
model: str = None,
112
**kwargs
113
) -> dict: ...
114
115
def leaderboard(
116
self,
117
data: pd.DataFrame | str = None,
118
extra_info: bool = False,
119
**kwargs
120
) -> pd.DataFrame: ...
121
122
class InterpretableTabularPredictor(TabularPredictor):
123
"""
124
Experimental predictor limited to interpretable models only.
125
Same interface as TabularPredictor but restricted to simple models.
126
"""
127
```
128
129
[Core Prediction Interface](./predictor.md)
130
131
### Experimental Scikit-learn Compatible Interfaces
132
133
Scikit-learn compatible wrappers providing familiar fit/predict interfaces for integration with existing scikit-learn workflows and pipelines.
134
135
```python { .api }
136
class TabularClassifier:
137
def fit(self, X: pd.DataFrame, y: pd.Series, **kwargs): ...
138
def predict(self, X: pd.DataFrame) -> np.ndarray: ...
139
def predict_proba(self, X: pd.DataFrame) -> np.ndarray: ...
140
def score(self, X: pd.DataFrame, y: pd.Series) -> float: ...
141
142
class TabularRegressor:
143
def fit(self, X: pd.DataFrame, y: pd.Series, **kwargs): ...
144
def predict(self, X: pd.DataFrame) -> np.ndarray: ...
145
def score(self, X: pd.DataFrame, y: pd.Series) -> float: ...
146
```
147
148
[Experimental Interfaces](./experimental.md)
149
150
### Model Collection and Registry
151
152
Comprehensive collection of machine learning models with unified interfaces, model registry for extensibility, and access to 30+ different algorithms from traditional ML to deep learning approaches.
153
154
```python { .api }
155
# Core Models
156
class LGBModel: ... # LightGBM gradient boosting
157
class XGBoostModel: ... # XGBoost gradient boosting
158
class CatBoostModel: ... # CatBoost gradient boosting
159
class RFModel: ... # Random Forest
160
class LinearModel: ... # Linear/Logistic Regression
161
class KNNModel: ... # K-Nearest Neighbors
162
163
# Neural Network Models
164
class NNFastAiTabularModel: ... # FastAI neural networks
165
class TabularNeuralNetTorchModel: ... # PyTorch neural networks
166
class TabPFNV2Model: ... # TabPFN v2
167
class FTTransformerModel: ... # Feature Tokenizer Transformer
168
169
# Model Registry
170
class ModelRegistry:
171
def register_model(self, name: str, model_class: type): ...
172
def get_model(self, name: str) -> type: ...
173
174
ag_model_registry: ModelRegistry
175
```
176
177
[Models and Registry](./models.md)
178
179
### Configuration and Presets
180
181
Pre-configured settings for different use cases, hyperparameter configuration system, and extensive customization options for advanced users.
182
183
```python { .api }
184
# Available presets
185
PRESETS = [
186
"best_quality", # Maximum accuracy, longer training
187
"high_quality", # High accuracy with fast inference
188
"good_quality", # Good accuracy with very fast inference
189
"medium_quality", # Medium accuracy, very fast training (default)
190
"optimize_for_deployment", # Optimizes for deployment by cleaning up models
191
"interpretable" # Interpretable models only
192
]
193
194
# Hyperparameter configuration functions
195
def get_hyperparameter_config(preset: str) -> dict: ...
196
def get_default_feature_generator(preset: str = "auto"): ...
197
```
198
199
[Configuration and Presets](./configurations.md)
200
201
## Types
202
203
```python { .api }
204
# Core data structures
205
TabularDataset = pd.DataFrame # Enhanced DataFrame for tabular data
206
207
class FeatureMetadata:
208
"""Metadata container for feature information"""
209
def __init__(self, type_map_raw: dict = None): ...
210
def get_features(self, valid_raw_types: list = None) -> list: ...
211
def get_feature_type_raw(self, feature: str) -> str: ...
212
213
# Problem types
214
PROBLEM_TYPES = Literal["binary", "multiclass", "regression", "quantile", "softclass"]
215
216
# Evaluation metrics
217
CLASSIFICATION_METRICS = [
218
"accuracy", "balanced_accuracy", "log_loss", "f1", "f1_macro", "f1_micro",
219
"f1_weighted", "roc_auc", "roc_auc_ovo", "roc_auc_ovr", "precision",
220
"precision_macro", "recall", "recall_macro", "mcc", "pac_score"
221
]
222
223
REGRESSION_METRICS = [
224
"root_mean_squared_error", "mean_squared_error", "mean_absolute_error",
225
"median_absolute_error", "mean_absolute_percentage_error", "r2",
226
"symmetric_mean_absolute_percentage_error"
227
]
228
229
# Weight strategies
230
WEIGHT_STRATEGIES = Literal["auto_weight", "balance_weight"]
231
```