0
# AutoGluon
1
2
AutoGluon is a comprehensive automated machine learning (AutoML) library that enables developers to build high-accuracy ML models with minimal code across multiple data modalities including tabular, text, image, and time series data. The library provides a unified interface through various predictor classes that automatically handle feature engineering, model selection, hyperparameter optimization, and ensemble creation.
3
4
## Package Information
5
6
- **Package Name**: autogluon
7
- **Language**: Python
8
- **Installation**: `pip install autogluon`
9
10
## Core Imports
11
12
```python
13
import autogluon
14
```
15
16
Common imports for specific domains:
17
18
```python
19
from autogluon.tabular import TabularPredictor, InterpretableTabularPredictor, TabularDataset
20
from autogluon.multimodal import MultiModalPredictor
21
from autogluon.timeseries import TimeSeriesPredictor, TimeSeriesDataFrame
22
```
23
24
Core utilities and data structures:
25
26
```python
27
from autogluon.core import constants, metrics
28
from autogluon.common import TabularDataset, FeatureMetadata
29
from autogluon.features import * # Feature generators
30
from autogluon.eda import AnalysisState
31
```
32
33
## Basic Usage
34
35
```python
36
# Tabular ML (most common use case)
37
predictor = TabularPredictor(label='target_column')
38
predictor.fit('train.csv', presets='best_quality')
39
40
# Make predictions
41
predictions = predictor.predict('test.csv')
42
probabilities = predictor.predict_proba('test.csv')
43
44
# Evaluate performance
45
performance = predictor.evaluate('test.csv')
46
leaderboard = predictor.leaderboard()
47
48
# Multimodal data (text + image + tabular)
49
mm_predictor = MultiModalPredictor(label='label', problem_type='multiclass')
50
mm_predictor.fit(train_data) # pandas DataFrame with text, image paths, and numerical columns
51
52
# Time series forecasting
53
ts_data = TimeSeriesDataFrame.from_data_frame(df, id_column='item_id', timestamp_column='timestamp')
54
ts_predictor = TimeSeriesPredictor(prediction_length=24, freq='H')
55
ts_predictor.fit(ts_data)
56
57
# Generate forecasts
58
forecasts = ts_predictor.predict(ts_data)
59
```
60
61
## Architecture
62
63
AutoGluon uses a modular architecture centered around specialized predictor classes:
64
65
- **TabularPredictor**: Handles structured/tabular data with automatic feature engineering, model selection from 10+ algorithms (RandomForest, XGBoost, LightGBM, CatBoost, Neural Networks), and intelligent ensembling
66
- **MultiModalPredictor**: Processes heterogeneous data combining text, images, and tabular features using foundation models (BERT, ResNet, Vision Transformers) with automatic modality-specific preprocessing
67
- **TimeSeriesPredictor**: Performs probabilistic forecasting with both statistical models (ARIMA, ETS) and deep learning models (DeepAR, Transformers), supporting multiple quantile levels
68
- **Core Infrastructure**: Shared utilities for metrics, hyperparameter optimization, model training, and evaluation across all predictor types
69
70
This design enables easy switching between domains while maintaining consistent APIs and leveraging state-of-the-art models for each data type.
71
72
## Capabilities
73
74
### Tabular Machine Learning
75
76
Automated machine learning for structured/tabular data supporting classification and regression tasks. Handles feature engineering, model selection, hyperparameter tuning, and ensembling with minimal configuration.
77
78
```python { .api }
79
class TabularPredictor:
80
def __init__(self, label: str, problem_type: str = None, path: str = None, **kwargs): ...
81
def fit(self, train_data, presets: str = None, time_limit: int = None, **kwargs): ...
82
def predict(self, data): ...
83
def predict_proba(self, data): ...
84
def evaluate(self, data, **kwargs): ...
85
def leaderboard(self, data=None, **kwargs): ...
86
```
87
88
[Tabular ML](./tabular.md)
89
90
### Multimodal Machine Learning
91
92
Automated machine learning for heterogeneous data combining text, images, and tabular features. Supports classification, regression, object detection, named entity recognition, and semantic matching tasks.
93
94
```python { .api }
95
class MultiModalPredictor:
96
def __init__(self, label: str = None, problem_type: str = None, presets: str = None, **kwargs): ...
97
def fit(self, train_data, **kwargs): ...
98
def predict(self, data, **kwargs): ...
99
def predict_proba(self, data, **kwargs): ...
100
def evaluate(self, data, **kwargs): ...
101
def extract_embedding(self, data, **kwargs): ...
102
```
103
104
[Multimodal ML](./multimodal.md)
105
106
### Time Series Forecasting
107
108
Probabilistic forecasting for univariate and multivariate time series data. Supports both statistical and deep learning models with automatic model selection and quantile predictions.
109
110
```python { .api }
111
class TimeSeriesPredictor:
112
def __init__(self, target: str = "target", prediction_length: int = 1, freq: str = None, **kwargs): ...
113
def fit(self, train_data, **kwargs): ...
114
def predict(self, data, **kwargs): ...
115
def evaluate(self, data, **kwargs): ...
116
def leaderboard(self, data=None, **kwargs): ...
117
```
118
119
[Time Series](./timeseries.md)
120
121
### Feature Engineering
122
123
Comprehensive feature generation and transformation capabilities for automated feature engineering across different data types.
124
125
```python { .api }
126
class AutoMLPipelineFeatureGenerator:
127
def __init__(self, **kwargs): ...
128
def fit_transform(self, X: DataFrame, y: Series = None, **kwargs): ...
129
def transform(self, X: DataFrame, **kwargs): ...
130
```
131
132
[Feature Engineering](./features.md)
133
134
### Core Utilities
135
136
Shared utilities for metrics, constants, and data structures used across all AutoGluon predictors.
137
138
```python { .api }
139
class TabularDataset:
140
def __init__(self, df: DataFrame): ...
141
@classmethod
142
def load(cls, file_path: str): ...
143
144
class Scorer:
145
def __init__(self, name: str, score_func: callable, **kwargs): ...
146
```
147
148
[Core Utilities](./core.md)
149
150
## Types
151
152
```python { .api }
153
# Problem type constants
154
BINARY = "binary"
155
MULTICLASS = "multiclass"
156
REGRESSION = "regression"
157
QUANTILE = "quantile"
158
159
# Common data structures
160
TabularDataset = pandas.DataFrame # Enhanced DataFrame with AutoGluon utilities
161
TimeSeriesDataFrame = pandas.DataFrame # Time series specific DataFrame structure
162
163
# Predictor types
164
TabularPredictor = Type[TabularPredictor]
165
MultiModalPredictor = Type[MultiModalPredictor]
166
TimeSeriesPredictor = Type[TimeSeriesPredictor]
167
```