0
# Automated Machine Learning
1
2
Complete automated machine learning pipeline that supports classification, regression, forecasting, ranking, and NLP tasks. AutoML automatically selects the best model and hyperparameters within a specified time budget, providing an efficient solution for various machine learning problems.
3
4
## Capabilities
5
6
### AutoML Class
7
8
The main AutoML class provides automated machine learning with intelligent model selection, hyperparameter optimization, and ensemble methods.
9
10
```python { .api }
11
class AutoML:
12
def __init__(self):
13
"""Initialize AutoML instance."""
14
15
def fit(self, X_train, y_train, task="classification", time_budget=60,
16
metric="auto", estimator_list="auto", eval_method="auto",
17
split_ratio=0.1, n_splits=5, ensemble=False,
18
n_jobs=1, verbose=0, **kwargs):
19
"""
20
Train AutoML model.
21
22
Args:
23
X_train: Training feature data (pandas DataFrame, numpy array, or sparse matrix)
24
y_train: Training target data (pandas Series or numpy array)
25
task (str): Task type - 'classification', 'regression', 'ts_forecast', 'rank', 'nlp'
26
time_budget (float): Time budget in seconds for training
27
metric (str or callable): Evaluation metric ('accuracy', 'roc_auc', 'rmse', 'mae', etc.)
28
estimator_list (list): List of estimator names to try ('auto' for default selection)
29
eval_method (str): Evaluation method - 'auto', 'cv', 'holdout'
30
split_ratio (float): Validation split ratio for holdout method
31
n_splits (int): Number of cross-validation folds
32
ensemble (bool): Whether to perform ensemble learning
33
n_jobs (int): Number of parallel jobs (-1 for all processors)
34
verbose (int): Verbosity level (0-5+)
35
36
Returns:
37
self: Fitted AutoML instance
38
"""
39
40
def predict(self, X, **kwargs):
41
"""
42
Make predictions on new data.
43
44
Args:
45
X: Feature data for prediction (same format as training data)
46
**kwargs: Additional prediction parameters
47
48
Returns:
49
numpy.ndarray: Predictions
50
"""
51
52
def predict_proba(self, X, **kwargs):
53
"""
54
Get prediction probabilities (classification only).
55
56
Args:
57
X: Feature data for prediction
58
**kwargs: Additional prediction parameters
59
60
Returns:
61
numpy.ndarray: Prediction probabilities
62
"""
63
64
def score(self, X, y, **kwargs):
65
"""
66
Evaluate model performance.
67
68
Args:
69
X: Feature data for evaluation
70
y: True target values
71
**kwargs: Additional scoring parameters
72
73
Returns:
74
float: Score based on the specified metric
75
"""
76
77
def add_learner(self, learner_name, learner_class):
78
"""
79
Add custom learner to estimator list.
80
81
Args:
82
learner_name (str): Name for the custom learner
83
learner_class: Learner class implementing fit/predict interface
84
"""
85
```
86
87
### Model Properties and Results
88
89
Access to the best model, configuration, and training results.
90
91
```python { .api }
92
class AutoML:
93
@property
94
def best_estimator(self):
95
"""Best trained estimator instance."""
96
97
@property
98
def best_config(self):
99
"""Best hyperparameter configuration found."""
100
101
@property
102
def best_loss(self):
103
"""Best validation loss achieved."""
104
105
@property
106
def model(self):
107
"""Trained model object (alias for best_estimator)."""
108
109
@property
110
def feature_importances_(self):
111
"""Feature importance values from the best model."""
112
113
@property
114
def classes_(self):
115
"""Class labels for classification tasks."""
116
117
@property
118
def best_config_per_estimator(self):
119
"""Best configuration for each estimator type tried."""
120
121
@property
122
def time_to_find_best_model(self):
123
"""Time taken to find the best model in seconds."""
124
125
@property
126
def feature_transformer(self):
127
"""Feature preprocessing pipeline."""
128
129
@property
130
def label_transformer(self):
131
"""Label preprocessing pipeline."""
132
```
133
134
### Model Management and Persistence
135
136
Save, load, and retrain models with configuration management.
137
138
```python { .api }
139
class AutoML:
140
def save_best_config(self, filename):
141
"""
142
Save best configuration to file.
143
144
Args:
145
filename (str): Path to save configuration
146
"""
147
148
def get_estimator_from_log(self, log_file_name, record_id, task):
149
"""
150
Extract estimator from training log.
151
152
Args:
153
log_file_name (str): Path to log file
154
record_id (int): Record identifier
155
task (str): Task type
156
157
Returns:
158
Trained estimator instance
159
"""
160
161
def retrain_from_log(self, log_file_name, X_train, y_train,
162
task, record_id=-1, **kwargs):
163
"""
164
Retrain model from logged configuration.
165
166
Args:
167
log_file_name (str): Path to training log
168
X_train: Training features
169
y_train: Training targets
170
task (str): Task type
171
record_id (int): Record ID (-1 for best)
172
**kwargs: Additional training parameters
173
"""
174
```
175
176
### Utility Functions
177
178
Helper functions for model analysis and configuration.
179
180
```python { .api }
181
def size(learner_classes, config):
182
"""
183
Calculate memory size for a model configuration.
184
185
Args:
186
learner_classes (dict): Dictionary of learner classes
187
config (dict): Model configuration
188
189
Returns:
190
float: Estimated memory size in bytes
191
"""
192
```
193
194
### Usage Examples
195
196
#### Basic Classification
197
```python
198
from flaml import AutoML
199
from sklearn.datasets import load_iris
200
from sklearn.model_selection import train_test_split
201
202
# Load data
203
X, y = load_iris(return_X_y=True)
204
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
205
206
# Train AutoML model
207
automl = AutoML()
208
automl.fit(X_train, y_train, task="classification", time_budget=30)
209
210
# Make predictions
211
predictions = automl.predict(X_test)
212
probabilities = automl.predict_proba(X_test)
213
214
print(f"Best model: {automl.best_estimator}")
215
print(f"Accuracy: {automl.score(X_test, y_test)}")
216
```
217
218
#### Regression with Custom Settings
219
```python
220
from flaml import AutoML
221
import pandas as pd
222
223
# Load regression data
224
df = pd.read_csv("regression_data.csv")
225
X = df.drop("target", axis=1)
226
y = df["target"]
227
228
# Configure AutoML
229
automl = AutoML()
230
settings = {
231
"task": "regression",
232
"time_budget": 300,
233
"metric": "rmse",
234
"estimator_list": ["lgbm", "xgboost", "rf"],
235
"ensemble": True,
236
"n_jobs": -1,
237
"verbose": 1
238
}
239
240
# Train and evaluate
241
automl.fit(X, y, **settings)
242
print(f"Best RMSE: {automl.best_loss}")
243
print(f"Feature importance: {automl.feature_importances_}")
244
```
245
246
#### Time Series Forecasting
247
```python
248
from flaml import AutoML
249
import pandas as pd
250
251
# Load time series data
252
df = pd.read_csv("timeseries.csv")
253
df["ds"] = pd.to_datetime(df["ds"])
254
255
# Configure for forecasting
256
automl = AutoML()
257
automl.fit(
258
df,
259
task="ts_forecast",
260
time_budget=600,
261
metric="mape",
262
period=12, # seasonal period
263
verbose=2
264
)
265
266
# Generate forecasts
267
forecasts = automl.predict(steps=24) # 24 steps ahead
268
```
269
270
#### Custom Learner Integration
271
```python
272
from flaml import AutoML
273
from sklearn.svm import SVC
274
275
# Add custom learner
276
automl = AutoML()
277
automl.add_learner("custom_svm", SVC)
278
279
# Use custom learner in training
280
automl.fit(
281
X_train, y_train,
282
task="classification",
283
estimator_list=["lgbm", "custom_svm"],
284
time_budget=120
285
)
286
```
287
288
## State Management Classes
289
290
Classes for managing AutoML training state and search configuration.
291
292
```python { .api }
293
class AutoMLState:
294
"""Manages AutoML training state and sample data preparation."""
295
296
def prepare_sample_train_data(self, sample_size):
297
"""
298
Prepare sampled training data for efficient search.
299
300
Args:
301
sample_size (int): Size of sample to create
302
"""
303
304
class SearchState:
305
"""Manages hyperparameter search state and validation."""
306
307
@property
308
def search_space(self):
309
"""Current search space configuration."""
310
311
@property
312
def estimated_cost4improvement(self):
313
"""Estimated cost for model improvement."""
314
```
315
316
## Supported Tasks and Metrics
317
318
### Task Types
319
- **classification**: Binary and multi-class classification
320
- **regression**: Continuous target prediction
321
- **ts_forecast**: Time series forecasting
322
- **rank**: Learning to rank
323
- **nlp**: Natural language processing tasks
324
325
### Metrics
326
- **Classification**: accuracy, roc_auc, roc_auc_ovr, f1, log_loss, precision, recall
327
- **Regression**: rmse, mae, mse, r2, mape
328
- **Forecasting**: mape, smape, mae, rmse
329
- **Ranking**: ndcg, ap
330
331
### Estimators
332
- **lgbm**: LightGBM (gradient boosting)
333
- **xgboost**: XGBoost (gradient boosting)
334
- **rf**: Random Forest
335
- **extra_tree**: Extra Trees
336
- **lrl1**: Logistic Regression with L1 regularization
337
- **lrl2**: Logistic Regression with L2 regularization
338
- **catboost**: CatBoost (if installed)
339
- **kneighbor**: K-Nearest Neighbors
340
- **prophet**: Prophet for time series (if installed)
341
- **arima**: ARIMA for time series